Go Map

什么是map？

map的特点：

Map 是一个无序的 key/value 集合；
Map 中所有的 key 都是不同的；
通过给定的 key ，可以在常数时间复杂度内查找、更新或删除相应的 value

想要实现一个性能优异的 Map，需要关注以下三个关键点：

哈希算法
处理哈希冲突
扩容策略

对于给定的 key，先进行哈希运算得到哈希值，然后按 Map 长度取模，将 key 映射到指定的位置，最后取得相应的 value。

hashmap

哈希算法

哈希算法有以下特点：

如果两个哈希值不同，那么这两个哈希值对应的原始输入肯定不同；
如果两个哈希值相同，那么这两个哈希值对应的原始输入可能相同，也可能不同（这种情况称为"哈希冲突/哈希碰撞/散列碰撞"）。

常见的哈希算法：

hash

速度与质量排序：

benchmark

Golang 使用的哈希算法

Golang 选择哈希算法时，根据 CPU 是否支持 AES 指令集进行判断，如果 CPU 支持 AES 指令集，则使用 Aes Hash，否则使用memhash。

AES Hash：AES 指令集全称是高级加密标准指令集（或称英特尔高级加密标准新指令，简称AES-NI），是一个 x86指令集架构的扩展，用于 Intel 和 AMD 处理器。利用 AES 指令集实现哈希算法性能很优秀，因为它能提供硬件加速。

相关代码： runtime/alg.go asm_amd64.s asm_arm64.s

memhash：网上没有找到这个哈希算法的作者信息，只在 Golang 的源码中有这几行注释，说它的灵感来源于 xxhash 和 cityhash。

相关代码： runtime/hash64.go runtime/hash32.go

处理哈希冲突

问题：不同的key可能会产生相同的哈希值

解决方法：链地址法和开放地址法

链地址法：

开放地址法：

对于开放寻址法而言，它只有数组一种数据结构就可完成存储，继承了数组的优点，对CPU缓存友好，易于序列化操作。但是它对内存的利用率不如链地址法，且发生冲突时代价更高。当数据量明确、装载因子小，适合采用开放寻址法。

链表节点可以在需要时再创建，不必像开放寻址法那样事先申请好足够内存，因此链地址法对于内存的利用率会比开方寻址法高。链地址法对装载因子的容忍度会更高，并且适合存储大对象、大数据量的哈希表。而且相较于开放寻址法，它更加灵活，支持更多的优化策略，比如可采用红黑树代替链表。但是链地址法需要额外的空间来存储指针。

值得一提的是，在Python中dict在发生哈希冲突时采用的开放寻址法，而java的HashMap采用的是链地址法。

在Go语言中使用链地址法（存疑：有一说是开放寻址法）

扩容策略

两个概念：哈希桶与装载因子

哈希桶：哈希桶（也称为槽，类似于抽屉原理中的一个抽屉）可以先简单理解为一个哈希值，所有的哈希值组成了哈希空间。
装载因子：装载因子是表示哈希表中元素的填满程度。它的计算公式：装载因子=填入哈希表中的元素个数/哈希表的长度。装载因子越大，填入的元素越多，空间利用率就越高，但发生哈希冲突的几率就变大。反之，装载因子越小，填入的元素越少，冲突发生的几率减小，但空间浪费也会变得更多，而且还会提高扩容操作的次数。装载因子也是决定哈希表是否进行扩容的关键指标，在java的HashMap的中，其默认装载因子为0.75；Python的dict默认装载因子为2/3。

随着 Map 中元素的增加，发生哈希冲突的概率会增加，Map 的读写性能也会下降，所以我们需要更多的桶和更大的内存来保证 Map 的读写性能。

在实际应用中，当装载因子超过某个阈值时，会动态地增加 Map 长度，实现自动扩容。

每当 Map 长度发生变化后，所有 key 在 Map 中对应的索引需要重新计算。如果一个一个计算原 Map 中的 key 的索引并插入到新 Map 中，这种一次性扩容方式是达不到生产环境的要求的，因为时间复杂度太高了O(n)，在数据量大的情况下性能会很差。

在实际应用中，Map 扩容都是分多次、渐进式地完成，而不是一性次完成扩容。

源码解读

map底层实现

常量定义：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
const (
	// 一个桶容纳最大键值对：8
	bucketCntBits = 3
	bucketCnt     = 1 << bucketCntBits

	// 装载因子13/2 = 6.5
	loadFactorNum = 13
	loadFactorDen = 2

	// 为了保持内联，键值最大长度：128字节
	// Must fit in a uint8.
	// Fast versions cannot handle big elems - the cutoff size for
	// fast versions in cmd/compile/internal/gc/walk.go must be at most this elem.
	maxKeySize  = 128
	maxElemSize = 128

	// 数据偏移量为bmap的容量, but needs to be
	// aligned correctly. For amd64p32 this means 64-bit alignment
	// even though pointers are 32 bit.
	dataOffset = unsafe.Offsetof(struct {
		b bmap
		v int64
	}{}.v)

	// 可能的tophash值。
    // 每个存储桶（包括其溢出存储桶，如果有的话）的所有条目都将处于排空*状态，或者没有条目处于排空*状态（除了在extrace()方法期间，该方法仅在map写入期间发生，因此在此期间其他人无法观察map）。
	emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
	emptyOne       = 1 // this cell is empty
	evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.
	evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
	evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
	minTopHash     = 5 // minimum tophash for a normal filled cell.

	// flags
	iterator     = 1 // there may be an iterator using buckets
	oldIterator  = 2 // there may be an iterator using oldbuckets
	hashWriting  = 4 // a goroutine is writing to the map
	sameSizeGrow = 8 // the current map growth is to a new map of the same size

	// 迭代检查桶 id 的哨兵
	noCheck = 1<<(8*sys.PtrSize) - 1
)

map的底层代码为：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
type hmap struct {
	count     int 		// map的size，len()的返回值
	flags     uint8		// 四种状态标志
	B         uint8  	// 桶的log_2对数  (能够装载 loadFactor * 2^B items 个元素)
	noverflow uint16 	// 溢出桶的大概数量; see incrnoverflow for details
	hash0     uint32 	// hash seed

	buckets    unsafe.Pointer // 指向buckets数组的指针，数组大小为2^B，如果元素个数为0，它为nil
	oldbuckets unsafe.Pointer // 如果发生扩容，oldbuckets是指向老的buckets数组的指针，老的buckets数组大小是新的buckets的1/2。非扩容状态下，它为nil。
	nevacuate  uintptr        // 表示扩容进度，用于标记当前旧桶中小于nevacuate的数据都已经转移到新桶中

	extra *mapextra 	// optional fields 这个字段是为了优化GC扫描而设计的。当key和value均不包含指针，并且都可以inline时使用。extra是指向mapextra类型的指针。
}

其中mapextra的实现为：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // 如果 key 和 value 都不包含指针，并且可以被 inline(<=128 字节)
    // 就使用 hmap的extra字段 来存储 overflow buckets，这样可以避免 GC 扫描整个 map
    // 然而 bmap.overflow 也是个指针。这时候我们只能把这些 overflow 的指针
    // 都放在 hmap.extra.overflow 和 hmap.extra.oldoverflow 中了
    // overflow 包含的是 hmap.buckets 的 overflow 的 buckets
    // oldoverflow 包含扩容时的 hmap.oldbuckets 的 overflow 的 bucket
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // 指向空闲的 overflow bucket 的指针
    nextOverflow *bmap
}

bmap桶的实现为：

1
2
3
4
5
6
// A bucket for a Go map.
type bmap struct {
    // tophash包含此桶中每个键的哈希值最高字节（高8位）信息
    // 如果tophash[0] < minTopHash，tophash[0]则代表桶的搬迁（evacuation）状态。
    tophash [bucketCnt]uint8
}

新建make map

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    //如果申请内存超出范围，hint置为0
	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
	if overflow || mem > maxAlloc {
		hint = 0
	}

	// 初始化 Hmap
	if h == nil {
		h = new(hmap)
	}
	h.hash0 = fastrand()	//生成随机种子

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
	B := uint8(0)
	for overLoadFactor(hint, B) {
		B++
	}
	h.B = B

	// allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
	if h.B != 0 {
		var nextOverflow *bmap
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)	//调用makeBucketArray生成桶和溢出桶
		if nextOverflow != nil {
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		}
	}

	return h
}

makemap64调用makemap函数，为了办证map长度不超过int大小

1
2
3
4
5
6
func makemap64(t *maptype, hint int64, h *hmap) *hmap {
	if int64(int(hint)) != hint {
		hint = 0
	}
	return makemap(t, int(hint), h)
}``

makemap_small在编译阶段已知元素个数最多为bucketCnt时使用

1
2
3
4
5
6
7
8
// makemap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {
	h := new(hmap)
	h.hash0 = fastrand()
	return h
}

makeBucketArray会为map申请内存：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// makeBucketArray initializes a backing array for map buckets.
// 1<<b is the minimum number of buckets to allocate.
// dirtyalloc should either be nil or a bucket array previously
// allocated by makeBucketArray with the same t and b parameters.
// If dirtyalloc is nil a new backing array will be alloced and
// otherwise dirtyalloc will be cleared and reused as backing array.
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
	base := bucketShift(b)
	nbuckets := base
	// For small b, overflow buckets are unlikely.
	// Avoid the overhead of the calculation.
	if b >= 4 {
		// Add on the estimated number of overflow buckets
		// required to insert the median number of elements
		// used with this value of b.
		nbuckets += bucketShift(b - 4)
		sz := t.bucket.size * nbuckets
		up := roundupsize(sz)
		if up != sz {
			nbuckets = up / t.bucket.size
		}
	}

	if dirtyalloc == nil {
		buckets = newarray(t.bucket, int(nbuckets))
	} else {
		// dirtyalloc was previously generated by
		// the above newarray(t.bucket, int(nbuckets))
		// but may not be empty.
		buckets = dirtyalloc
		size := t.bucket.size * nbuckets
		if t.bucket.ptrdata != 0 {
			memclrHasPointers(buckets, size)
		} else {
			memclrNoHeapPointers(buckets, size)
		}
	}

	if base != nbuckets {
		// We preallocated some overflow buckets.
		// To keep the overhead of tracking these overflow buckets to a minimum,
		// we use the convention that if a preallocated overflow bucket's overflow
		// pointer is nil, then there are more available by bumping the pointer.
		// We need a safe non-nil pointer for the last overflow bucket; just use buckets.
		nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
		last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
		last.setoverflow(t, (*bmap)(buckets))
	}
	return buckets, nextOverflow
}

当b大于等于4的时候，生成2^(b-4)个溢出桶

map查找

key 经过 Hash 计算后得到 64 位哈希值（64位机器）；
用哈希值最后 B 个 bit 位计算它落在哪个桶；
用哈希值高 8 位计算它在桶中的索引位置。

查找使用mapaccess1_XXX与mapaccess2_XXX函数，其中前者返回查找值，后置返回查找值与ok值

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// mapaccess1 返回 h[key]的指针。永不返回nil指针, 如果key不在map中则返回一个零对象键值对的引用
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    // 获取 caller 的 程序计数器 program counter
	if raceenabled && h != nil {
        // 获取 mapaccess1 的程序计数器 program counter
		callerpc := getcallerpc()
		pc := funcPC(mapaccess1)
		racereadpc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
	if msanenabled && h != nil {
		msanread(key, t.key.size)
	}
    //如果h为nil，则返回零值引用
	if h == nil || h.count == 0 {
		if t.hashMightPanic() {
			t.hasher(key, 0) // see issue 23734
		}
		return unsafe.Pointer(&zeroVal[0])
	}
	if h.flags&hashWriting != 0 {
		throw("concurrent map read and map write")
	}
	hash := t.hasher(key, uintptr(h.hash0))	//计算key的hash值
	m := bucketMask(h.B)	//低B位掩码
	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))//b是桶地址
	if c := h.oldbuckets; c != nil {	//如果旧桶中存在值
		if !h.sameSizeGrow() {
            // 旧桶中m缩小一倍
			// There used to be half as many buckets; mask down one more power of two.
			m >>= 1
		}
		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))//计算key在旧桶中的位置
		if !evacuated(oldb) {
			b = oldb	//如果没有迁移到新桶，则在旧桶中寻找
		}
	}
	top := tophash(hash)	//取出hash值的高8位
bucketloop:
	for ; b != nil; b = b.overflow(t) {
		for i := uintptr(0); i < bucketCnt; i++ {
			if b.tophash[i] != top {
				if b.tophash[i] == emptyRest {
					break bucketloop	//如果有emptyRest标志位，表示为空，break循环
				}
				continue
			}
            //tophash匹配，定位到key的位置
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
			if t.key.equal(key, k) {	//如果key相等
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() {
					e = *((*unsafe.Pointer)(e))
				}	
				return e	//定位到value值并返回
			}
		}
	}
	return unsafe.Pointer(&zeroVal[0])	//如果没有匹配则返回零值
}

map插入

插入 key 的过程和查找 key 的过程大体一致。

源码中插入相关的是 mapassign 函数。

插入 key 的过程和查找 key 有几点不同：

如果找到了待插入的 key ，则直接更新对应的 value 值；
会在 oldbucket 中查找 key，但不会在 oldbucket 中插入 key。
如果在 bmap 中没有找到待插入的 key ，这时分两种情况：

情况一：bmap 中还有空位，在遍历 bmap 的时候预先标记可插入的空位，如果查找结束也没有找到 key，就把 key 放到预先标记的空位上。
情况二：bmap中没有空位了，此时检查一次是否达到最大装载因子。如果达到了，立即进行扩容操作。扩容以后在新桶里面插入 key。如果没有达到最大装载因子，则新生成一个 bmap，并且把前一个 bmap 的 overflow 指针指向新的 bmap。

map删除

源码中删除相关的是 mapdelete 函数。

删除 key 的流程和查找 key 流程也差不多：

找到对应的 key 以后，清除相应的 key 和 value 。如果是指针类型，就把指针置为 nil；如果是值就清空值对应的内存；
清除 tophash 里的值，并做一些标记；
最后把 hmap 的计数减1；
如果是在扩容过程中，会在扩容完成后在新的 bmap 里面执行删除操作。

map扩容

什么时候扩容？

插入 key 时会在以下两种情况触发哈希的扩容：

装载因子超过 6.5，翻倍扩容；
使用了太多溢出桶，等量扩容；

情况 2 中，溢出桶太多的判定标准是：

B < 15 时，溢出桶数量 >= 2^B
B >= 15 时，溢出桶数量 >= 2^15

map迁移

在扩容相关的 hashGrow() 方法中，最后一段注释中提到， Map 扩容后真正的数据迁移工作是由 growWork() 和 evacuate() 这两个方法渐进式地完成的，而触发数据迁移的时机是插入 Key 和删除 Key。

参考

深入理解 Golang Map

Go语言底层原理分析——郑建勋

Golang源码解析