Java-hashmap,concurrentHashmap

java8的hashmap和concurrentHashmap

HashMap

hashmap的几个字段的含义

int threshold;             // 所能容纳的key-value对极限，超过就要进行扩容。threshold = table.length * loadFactor
final float loadFactor;    // 负载因子
int size;   //保存的键值对的数目
transient Node<K,V>[] table;   // 散列表数组

hashmap的构造函数

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                        initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                        loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
   
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param  initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
   
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
   
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

在构造函数里面，如果我们传入了初始值大小，则会对这个值做一个tableSizeFor(...)的处理，处理的结果是最小的大于cap的2的幂(这个算法没看明白)，并且设置 threshold 的值为它，否则 threshold 的值为 0

hashmap的散列函数
1
index = hash(key) & (table.length()-1)
当 table.length() 的值是2的幂次方的时候，table.length()-1的值在bit位上则全是1，因此这里的index的值实际上等于hash值的末位数字。比如 index = 101101010 & 111 = 010 = 2 ,这个key对应的数据应该放在数组的2位置。
put过程
key-value
在hashmap 中以 Node 节点的形式存储，Node节点保存有next值，指向下一个节点。如果存在下一个节点，则说明这里有hash冲突，作为链表保存，否则仅仅是一个Node节点存储。其中，转化为红黑树的时候，红黑树的节点 TreeNode 是Node 的子类

扩容

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;    //如果是刚刚初始化，这里oldTab是null
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold ， 两倍
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;   // 对应于在构造函数中传入了初始化大小，那么tableSizeFor处理后的值就成了散列表数组的初始化大小
    else {               // zero initial threshold signifies using defaults
        // 在构造函数中如果没有传入初始化大小，这里oldThr是等于0的，采用默认值，默认是16
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    //  前面确定了扩容后的参数，这里把原来的节点移到新的数组中去
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)  //如果在这个位置上没有冲突，则直接复制过去
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)   //如果这个位置上是红黑树，处理
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order，如果这个位置是个长度大于1的链表
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {  //这句在下面有分析，是怎么拆分链表的
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

当需要扩容的时候，会调用 resize()函数。在扩容后，因为散列函数不变，因此

1	index = hash(key) & (table.length()-1)

仍然成立，但是此时，table的大小变成了原来的两倍，因此在散列的时候，后面的括号的值会多一位1，假如刚开始有：

index = 101101010 & 111 = 010 = 2 ；

那么扩容后则变成了：

index = 101101010 & 1111 = 1010 = 10

观察下可以知道，如果高1位（第四位，扩容后的最高位）的值为0，那么扩容后节点的位置不变，如果高1位的位置为1，那么扩容后节点的位置为index + table.length。而获取这个高1位的值的方法，则可以是

hash(key) & (table.length())

因为 table.length() 是2的幂次方，因此一定是 ..00100… ，相与运算之后刚好可以得到这个位的值是1还是0

在 resize() 函数中，如果是0，则添加到 newTab[j] 位置的链表中去，如果是1，则添加到 newTab[j + oldCap] 的链表中去

重写key的equals和hashCode :
A和B对象equals方法返回true，hashCode方法返回值必然一样；
A和B对象hashCode不一样，那么equals方法必须返回false。
A和B对象hashCode一样，不能判定A equals B。

ConcurrentHashMap

一些参数

//  0：默认值
//  -1：代表哈希表正在进行初始化
//  -N 表示有N-1个线程正在进行扩容操作                     
//  大于0：相当于 HashMap 中的 threshold，表示阈值                  
private transient volatile int sizeCtl;

//表示散列表
transient volatile Node<K,V>[] table;
//哈希表扩容的时候会用，扩容完成后会被重置为 null。
private transient volatile Node<K,V>[] nextTable;

static final int MOVED     = -1; // hash值是-1，表示这是一个forwardNode节点  
static final int TREEBIN   = -2; // hash值是-2  表示这时一个TreeBin节点

构造函数，这里只选常用的

// 这里和hashmap有些不一样，初始化的大小是 initialCapacity*1.5+1，再向上取到2的n次方，
// hashmap是直接用 initialCapacity 向上取到2的n次方
public ConcurrentHashMap(int initialCapacity) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException();
    int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
               MAXIMUM_CAPACITY :
               tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
    this.sizeCtl = cap;
}
//不带参数甚至什么都不初始化
public ConcurrentHashMap() {
}

put函数

 public V put(K key, V value) {
     return putVal(key, value, false);
 }

/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        if (tab == null || (n = tab.length) == 0)
            //initTable()的时候也需要考虑多个线程操作的情景，使用cas来保证线程同步
            tab = initTable();
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) { //如果这个数组位置上还没有节点，cas插入
            if (casTabAt(tab, i, null,
                         new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }
        else if ((fh = f.hash) == MOVED)
            //  帮助数据迁移，假如此时正在扩容
            tab = helpTransfer(tab, f);
        else {
            V oldVal = null;
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) {
                        binCount = 1;
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node<K,V> pred = e;
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                              value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                    else if (f instanceof ReservationNode)
                        throw new IllegalStateException("Recursive update");
                }
            }
            // binCount != 0 表示插入到了节点到了链表或者红黑树中去了，返回之前的值
            if (binCount != 0) {
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}

从过程上看，和hashmap的区别在于，插入的时候，

如果在数组对应的hash位置上没有元素，那么使用cas来插入而不是直接赋值，如果插入失败，会继续循环继续判断这个位置有没有别的线程已经插入了，直到插入成功为止。
如果对应的hash位置上已经有元素了，那么这里要么是单链表要么是红黑树，把头结点作为锁，再执行插入操作
hash的方法没变，但是对于hashcode,会先spread() : spread(hash) & (n-1)
因为有个for( ; ; )循环的存在，所以遇到节点是MOVED的时候，会先帮助进行数据迁移。

final Node<K,V>[] helpTransfer(Node<K,V>[] tab, Node<K,V> f) {
    Node<K,V>[] nextTab; int sc;
    if (tab != null && (f instanceof ForwardingNode) &&
        (nextTab = ((ForwardingNode<K,V>)f).nextTable) != null) {
        //返回一个 16 位长度的扩容校验标识
        int rs = resizeStamp(tab.length);
        while (nextTab == nextTable && table == tab &&
               (sc = sizeCtl) < 0) {
            //sizeCtl 如果处于扩容状态的话
            //前 16 位是数据校验标识，后 16 位是当前正在扩容的线程总数
            //这里判断校验标识是否相等，如果校验符不等或者扩容操作已经完成了，直接退出循环，不用协助它们扩容了
            if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                sc == rs + MAX_RESIZERS || transferIndex <= 0)
                break;
            //否则调用 transfer 帮助它们进行扩容
            //sc + 1 标识增加了一个线程进行扩容
            if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) {
                transfer(tab, nextTab);
                break;
            }
        }
        return nextTab;
    }
    return table;
}

为并发而生的 ConcurrentHashMap

转移函数

private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
       int n = tab.length, stride;
       if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
           stride = MIN_TRANSFER_STRIDE; // 计算每个线程转移的数据的最小步长
       if (nextTab == null) {            // initiating
           try {
               @SuppressWarnings("unchecked")
               Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
               nextTab = nt;
           } catch (Throwable ex) {      // try to cope with OOME
               sizeCtl = Integer.MAX_VALUE;
               return;
           }
           nextTable = nextTab;
           transferIndex = n;
       }
       int nextn = nextTab.length;
       ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
       boolean advance = true;
       boolean finishing = false; // to ensure sweep before committing nextTab
       for (int i = 0, bound = 0;;) {  //这里的for循环，下面会多次执行
           Node<K,V> f; int fh;
           while (advance) {  
               int nextIndex, nextBound;
               if (--i >= bound || finishing)
                   advance = false;
               // 这里transferIndex小于0表示数组迁移任务已经分配完了，不需要协助了
               else if ((nextIndex = transferIndex) <= 0) {
                   i = -1;
                   advance = false;
               }
               //这里的else if 分支表示为线程分配任务，负责的区间在数组上的索引是（nextbound,nextIndex），第一次while循环的时候前面两个分支都不满足条件，进来这里分配，通过CAS更新transferIndex的值为前一个transferIndex-stride,更新成功后i也有了值，然后跳出了while循环，在下次for循环进来的时候会进入while的第一/二个分支
               else if (U.compareAndSwapInt
                        (this, TRANSFERINDEX, nextIndex,
                         nextBound = (nextIndex > stride ?
                                      nextIndex - stride : 0))) {
                   bound = nextBound;
                   i = nextIndex - 1;
                   advance = false;
               }
           }
          	//当前线程所有任务完成
           if (i < 0 || i >= n || i + n >= nextn) {
               int sc;
               //结束了就更新相关的变量
               if (finishing) {
                   nextTable = null;
                   table = nextTab;
                   sizeCtl = (n << 1) - (n >>> 1);
                   return;
               }
               if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                   if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                       return;
                   finishing = advance = true;
                   i = n; // recheck before commit
               }
           }
           //如果散列表旧表这个位置为空，则把ForwardingNode赋值给这个位置
           else if ((f = tabAt(tab, i)) == null)
               advance = casTabAt(tab, i, null, fwd);
           //如果散列表旧表这个位置为ForwardingNode，表示已经处理过了
           else if ((fh = f.hash) == MOVED)
               advance = true; // already processed
           else {
               synchronized (f) {
                   //如果散列表旧表这个位置为链表节点，CAS的方式迁移，位置变化和hashmap差不多，最后会追加在散列表旧表里面设置ForwardingNode表示已经处理过了
                   if (tabAt(tab, i) == f) {
                       Node<K,V> ln, hn;
                       if (fh >= 0) {
                           int runBit = fh & n;
                           Node<K,V> lastRun = f;
                           for (Node<K,V> p = f.next; p != null; p = p.next) {
                               int b = p.hash & n;
                               if (b != runBit) {
                                   runBit = b;
                                   lastRun = p;
                               }
                           }
                           if (runBit == 0) {
                               ln = lastRun;
                               hn = null;
                           }
                           else {
                               hn = lastRun;
                               ln = null;
                           }
                           for (Node<K,V> p = f; p != lastRun; p = p.next) {
                               int ph = p.hash; K pk = p.key; V pv = p.val;
                               if ((ph & n) == 0)
                                   ln = new Node<K,V>(ph, pk, pv, ln);
                               else
                                   hn = new Node<K,V>(ph, pk, pv, hn);
                           }
                           setTabAt(nextTab, i, ln);
                           setTabAt(nextTab, i + n, hn);
                           setTabAt(tab, i, fwd);
                           advance = true;
                       }
                       //如果散列表旧表这个位置为红黑树，CAS的方式迁移，最后会追加在散列表旧表里面设置ForwardingNode表示已经处理过了
                       else if (f instanceof TreeBin) {
                           TreeBin<K,V> t = (TreeBin<K,V>)f;
                           TreeNode<K,V> lo = null, loTail = null;
                           TreeNode<K,V> hi = null, hiTail = null;
                           int lc = 0, hc = 0;
                           for (Node<K,V> e = t.first; e != null; e = e.next) {
                               int h = e.hash;
                               TreeNode<K,V> p = new TreeNode<K,V>
                                   (h, e.key, e.val, null, null);
                               if ((h & n) == 0) {
                                   if ((p.prev = loTail) == null)
                                       lo = p;
                                   else
                                       loTail.next = p;
                                   loTail = p;
                                   ++lc;
                               }
                               else {
                                   if ((p.prev = hiTail) == null)
                                       hi = p;
                                   else
                                       hiTail.next = p;
                                   hiTail = p;
                                   ++hc;
                               }
                           }
                           ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                               (hc != 0) ? new TreeBin<K,V>(lo) : t;
                           hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                               (lc != 0) ? new TreeBin<K,V>(hi) : t;
                           setTabAt(nextTab, i, ln);
                           setTabAt(nextTab, i + n, hn);
                           setTabAt(tab, i, fwd);
                           advance = true;
                       }
                   }
               }
           }
       }
   }

这里主要是两个无限循环导致分支的控制有点复杂，在迁移的过程中还是要对数组旧表的位置处的结点加锁。整个迁移过程不加锁的原因是，根据hash函数，旧表迁移到新表，旧表中位置为i的节点在新表中只可能有两个位置，i和i+n，只需要对这个节点加锁，保证迁移过程就行。A线程负责i节点，B线程负责j节点，C线程负责k节点，只要节点没有重叠，迁移就不会有多线程的问题。完成后会把旧表中这个位置设置为ForwardingNode，这样别的线程扫描到这个节点也会发现处理过了，跳过它。