HashMap internal working principle
HashMap
data structure works on the principle of hashing. Hashing principle
works on any pre defined algorithm maintaining certain criteria to
assign a consistent value or code to an object .
To
be precise hashing algorithm must be such that it should always
provide same hash code for same or equal objects. There is a strong
relation between hashcode and equal method . That we will see later
in this article when HashMap will be explained in depth. Hash code is
being defined by the hashCode() method of Object class which is the
mother class of java.
What is hashCode() :
HashCode() is a method which belongs to Object class.It calculates an integer value for every object. Definition of hashCode() in Object class is :
1 | public native int hashCode();
|
This method is defined as native because it deals with native data . It assigns an integer value which actually represents pointer location of the object in heap of jvm. Also this method is not part of library as internal address is not available via jdk. So it is a machine dependant code written in native C.
What is Bucket :
Bucket can be
considered as the skeleton of a book shelf . In book shelf there are
no of rows and each row contains no of books . Now almost same
concept is used in HashMap .
Just
like books are the building block of the shelf , Entry is the
building block of HashMap.Entry
class is an inner class of HashMap which extends Map.Entry class.
This class contains key and value to be stored in HashMap and a
linking reference of the Entry class . Significance of this reference
object will be explained later in the article. So whenever a new key
value pair gets inserted into HashMap , HashMap internally creates an
Entry object and stores key value pair into that.
1 2 3 4 5 6 7 8 | static class Entry<K, V> implements Map.Entry<K, V> { final K key; V value; Entry<K, V> next; int hash; .... } |
Now what about the book shelf rows or skeleton of the analogy ? To store all these Entry objects bucket or table concept comes into picture. It uses EntrySet which extends AbstractSet of type Map.Entry , super class of Entry of HashMap . So bucket is the set of all entry classes which can be stored in that set location. Bucket can be considered as single row of the bookshelf which contains multiple books ,i.e. , Entry for HashMap analogy.
1 2 3 | private final class EntrySet extends AbstractSet<Map.Entry<K,V>> { .... } |
Now question must pops into mind why bookshelf analogy has been used for HashMap where multiple books are being kept on a single row . This question will be answered shortly in the discussion.
How HashMap internally works :
Now HashMap got its building block to store key value pair as Entry . So when programmer initializes HashMap,it will call any of the four constructor provided in HashMap class.
Now
if default constructor is called then HashMap will be initialised
with a default capacity of 16 , i.e. , bucket size will be of 16 and
loadfactor of 0.75.
If
second one is being called , capacity will paramInt and loadfactor
will be 0.75.
For
third one capacity will be paramInt and loadfactor will be paramFloat
.
The
load factor is a measure of how full the hash table is allowed to get
before its capacity is automatically increased and
existing contents will be rehashed.
How put method works :
After defining HashMap of particular object ( as it is generic class ) what will happen when programer tries to insert key - value pair into HashMap using put method of HashMap ?
![]() |
| put method of HashMap |
When programmer calls put method with a key – value pair , it checks whether key is null as HashMap allows one null key and put the same in the first position of the table using putForNullKey method of HashMap. If already null objects exists in the location , old value will be replaced by the new one and old value will be returned.
![]() |
| putForNullKey method of HashMap |
For
not Null Key, hashCode method of that Object will be called and
hashCode will be assigned to the key . This value will be passed to
the hash() method of map and output passed to the indexFor method of
Map which gives the index location of the bucket . It is actually
index location of internal array called table which stores mapping of
Object Map.Entry , key – value pair object.
So
at the of insertion an Entry object will be created having key and
value and reference object will be null . This Object will be mapped
in the particular bucket location whch comes from hashCode of the
object. So hashCode basically decides the object location in the
internal array or table in HashMap. Now this will be the case when
there is only one object mapped to any index of table. But there can
be collision , i.e. , more than one object can have same hashcode
value . In that case all the objects having same hashcode will be
mapped to same index location of internal array of the map.
So
next obvious question comes into mind how hashmap is going to resolve
that collision ? This is little tricky. This time that reference
object of Entry comes into picture. Basically when collison occure
hashmap goes down to the partcular index and checks if that key
already exists in the bucket or not . If present just old value of
the Entry will be replaced by new one. If does not exist , it will
form a linkedlist of Entry and map new Entry object to reference
object of the last Entry Object present in the list. This is how
HashMap works.
Get method :
Now its time to retrieve object from the map. To do so HashMap provides get method. Same as put when programmer tries to retrieve an Object , i.e. , key , hashcode will be generated for the same object which defines array index of table . So this hashcode has to be same as that was calculated at the time of insertion. That is why hashing algorithm should be defned in such a way that it must provide same hashcode for same object consistantly . Now after getting the index location , hashmap checks the linkedlist of that location to find correct Object using equal method of Key object. That is why key Object must override hashcode and equal method to get consistent result. So dont get confused about the retrieving logic.
![]() |
| get method of HashMap |
Some important points about HashMap :
- Immutable objects are the perfect for key in HashMap. Immutable or final objects with properly defined equal and hashCode method reduces the probability of collision and improve performance. Immutability is required to prevent ambiguity. Hashcode gets calculated on the basis of the field value of the object. Now if field value gets changed , dfferent hashcode will be calculated at the time of insertion as well as retrieval which causes ambiguity and may be correct value will not be retrieved. Immutability is also required to maintain thread safety. That is why all wrapper classes and strings are considered as best key.
- When HashMap gets full upto its threshold level which is defined as capacity* loadfactor , i.e. , 16*0.75 , HashMap resizes its capacity to twice of the initial capacity. At the same time it will rehash all the existing data and stores in new bucket location defined by rehashing mechanism. It actually cost performance . Thats why there is a trade off between loadfactor and performance . But it has been seen 0.75 gives good performance . So 0.75 is defined as default loadfactor for HashMap by Java.




Comments
Post a Comment