HashMap internal working principle


           HashMap data structure works on the principle of hashing. Hashing principle works on any pre defined algorithm maintaining certain criteria to assign a consistent value or code to an object .
To be precise hashing algorithm must be such that it should always provide same hash code for same or equal objects. There is a strong relation between hashcode and equal method . That we will see later in this article when HashMap will be explained in depth. Hash code is being defined by the hashCode() method of Object class which is the mother class of java.

What is hashCode() :

         HashCode() is a method which belongs to Object class.It calculates an integer value for every object. Definition of hashCode() in Object class is :


1
public native int hashCode();

        This method is defined as native because it deals with native data . It assigns an integer value which actually represents pointer location of the object in heap of jvm. Also this method is not part of library as internal address is not available via jdk. So it is a machine dependant code written in native C.

HashMap basically works on the concept of hash value and Bucket.

What is Bucket :

        Bucket can be considered as the skeleton of a book shelf . In book shelf there are no of rows and each row contains no of books . Now almost same concept is used in HashMap .
        Just like books are the building block of the shelf , Entry is the building block of HashMap.Entry class is an inner class of HashMap which extends Map.Entry class. This class contains key and value to be stored in HashMap and a linking reference of the Entry class . Significance of this reference object will be explained later in the article. So whenever a new key value pair gets inserted into HashMap , HashMap internally creates an Entry object and stores key value pair into that.


1
2
3
4
5
6
7
8
static class Entry<K, V> implements Map.Entry<K, V> {
  final K key;
  V value;
  Entry<K, V> next;
  int hash;

  ....
}
        

         Now what about the book shelf rows or skeleton of the analogy ? To store all these Entry objects bucket or table concept comes into picture. It uses EntrySet which extends AbstractSet of type Map.Entry , super class of Entry of HashMap . So bucket is the set of all entry classes which can be stored in that set location. Bucket can be considered as single row of the bookshelf which contains multiple books ,i.e. , Entry for HashMap analogy.


1
2
3
private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
  ....
}
        

         Now question must pops into mind why bookshelf analogy has been used for HashMap where multiple books are being kept on a single row . This question will be answered shortly in the discussion.

How HashMap internally works : 

         Now HashMap got its building block to store key value pair as Entry . So when programmer initializes HashMap,it will call any of the four constructor provided in HashMap class.


                public HashMap()
             public HashMap(int paramInt)
             public HashMap(int paramInt, float paramFloat)
             public HashMap(Map<? extends K, ? extends V> paramMap)

       Now if default constructor is called then HashMap will be initialised with a default capacity of 16 , i.e. , bucket size will be of 16 and loadfactor of 0.75.
If second one is being called , capacity will paramInt and loadfactor will be 0.75.
For third one capacity will be paramInt and loadfactor will be paramFloat .

      The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased and existing contents will be rehashed.

How put method works :

       After defining HashMap of particular object ( as it is generic class ) what will happen when programer tries to insert key - value pair into HashMap using put method of HashMap ?


put method of HashMap
put method of HashMap


        When programmer calls put method with a key – value pair , it checks whether key is null as HashMap allows one null key and put the same in the first position of the table using putForNullKey method of HashMap. If already null objects exists in the location , old value will be replaced by the new one and old value will be returned.


putForNullKey method of HashMap
putForNullKey method of HashMap


        For not Null Key, hashCode method of that Object will be called and hashCode will be assigned to the key . This value will be passed to the hash() method of map and output passed to the indexFor method of Map which gives the index location of the bucket . It is actually index location of internal array called table which stores mapping of Object Map.Entry , key – value pair object.
        So at the of insertion an Entry object will be created having key and value and reference object will be null . This Object will be mapped in the particular bucket location whch comes from hashCode of the object. So hashCode basically decides the object location in the internal array or table in HashMap. Now this will be the case when there is only one object mapped to any index of table. But there can be collision , i.e. , more than one object can have same hashcode value . In that case all the objects having same hashcode will be mapped to same index location of internal array of the map.
         
        So next obvious question comes into mind how hashmap is going to resolve that collision ? This is little tricky. This time that reference object of Entry comes into picture. Basically when collison occure hashmap goes down to the partcular index and checks if that key already exists in the bucket or not . If present just old value of the Entry will be replaced by new one. If does not exist , it will form a linkedlist of Entry and map new Entry object to reference object of the last Entry Object present in the list. This is how HashMap works.


HashMap internal structure


Get method :

        Now its time to retrieve object from the map. To do so HashMap provides get method. Same as put when programmer tries to retrieve an Object , i.e. , key , hashcode will be generated for the same object which defines array index of table . So this hashcode has to be same as that was calculated at the time of insertion. That is why hashing algorithm should be defned in such a way that it must provide same hashcode for same object consistantly . Now after getting the index location , hashmap checks the linkedlist of that location to find correct Object using equal method of Key object. That is why key Object must override hashcode and equal method to get consistent result. So dont get confused about the retrieving logic.


get method of HashMap
get method of HashMap

Some important points about HashMap :

  • Immutable objects are the perfect for key in HashMap. Immutable or final objects with properly defined equal and hashCode method reduces the probability of collision and improve performance. Immutability is required to prevent ambiguity. Hashcode gets calculated on the basis of the field value of the object. Now if field value gets changed , dfferent hashcode will be calculated at the time of insertion as well as retrieval which causes ambiguity and may be correct value will not be retrieved. Immutability is also required to maintain thread safety. That is why all wrapper classes and strings are considered as best key.
  • When HashMap gets full upto its threshold level which is defined as capacity* loadfactor , i.e. , 16*0.75 , HashMap resizes its capacity to twice of the initial capacity. At the same time it will rehash all the existing data and stores in new bucket location defined by rehashing mechanism. It actually cost performance . Thats why there is a trade off between loadfactor and performance . But it has been seen 0.75 gives good performance . So 0.75 is defined as default loadfactor for HashMap by Java.


Comments

Popular posts from this blog

HashMap

What is Stream API in Java