Collision Resolution: Chaining vs Open Addressing
Collision resolution is how a hash table stores two keys that hash to the same bucket. Because collisions are unavoidable, every real hash table needs a strategy. The two classics are separate chaining (each bucket holds a list of entries) and open addressing (collided entries spill into other buckets). Both keep operations O(1) on average as long as the load factor stays low.
Separate chaining
Each bucket points to a small list of all entries that hashed there. To find a key, jump to its bucket and scan that short list.
#include <vector>
#include <list>
#include <utility>
struct ChainedMap {
std::vector<std::list<std::pair<int,int>>> buckets;
ChainedMap(int n) : buckets(n) {}
int idx(int key) { return ((key % (int)buckets.size()) + buckets.size()) % buckets.size(); }
void put(int key, int val) {
auto& chain = buckets[idx(key)];
for (auto& kv : chain) if (kv.first == key) { kv.second = val; return; }
chain.push_back({key, val}); // collision? append to the chain
}
bool get(int key, int& out) {
for (auto& kv : buckets[idx(key)]) if (kv.first == key) { out = kv.second; return true; }
return false;
}
};
import java.util.*;
class ChainedMap {
List<List<int[]>> buckets;
ChainedMap(int n) {
buckets = new ArrayList<>();
for (int i = 0; i < n; i++) buckets.add(new LinkedList<>());
}
int idx(int key) { return Math.floorMod(key, buckets.size()); }
void put(int key, int val) {
List<int[]> chain = buckets.get(idx(key));
for (int[] kv : chain) if (kv[0] == key) { kv[1] = val; return; }
chain.add(new int[]{key, val}); // collision? append to the chain
}
Integer get(int key) {
for (int[] kv : buckets.get(idx(key))) if (kv[0] == key) return kv[1];
return null;
}
}
class ChainedMap {
constructor(n) {
this.buckets = Array.from({ length: n }, () => []);
}
idx(key) { return ((key % this.buckets.length) + this.buckets.length) % this.buckets.length; }
put(key, val) {
const chain = this.buckets[this.idx(key)];
for (const kv of chain) if (kv[0] === key) { kv[1] = val; return; }
chain.push([key, val]); // collision? append to the chain
}
get(key) {
for (const kv of this.buckets[this.idx(key)]) if (kv[0] === key) return kv[1];
return undefined;
}
}
class ChainedMap:
def __init__(self, n):
self.buckets = [[] for _ in range(n)]
def idx(self, key):
return key % len(self.buckets)
def put(self, key, val):
chain = self.buckets[self.idx(key)]
for kv in chain:
if kv[0] == key:
kv[1] = val
return
chain.append([key, val]) # collision? append to the chain
def get(self, key):
for kv in self.buckets[self.idx(key)]:
if kv[0] == key:
return kv[1]
return None
With a good hash and load factor α, each chain has length ~α, so operations are O(1 + α) — effectively O(1). Chaining is simple, degrades gracefully, and is what Java’s HashMap and C++‘s std::unordered_map use. (Java even upgrades a long chain into a balanced tree, capping the worst case at O(log n).)
Open addressing (linear probing)
Here every entry lives directly in the bucket array — no side lists. On a collision, you probe the next slots until you find an empty one. Linear probing just checks index, index+1, index+2, … (wrapping around).
#include <vector>
struct LinearProbeMap {
std::vector<int> keys, vals, used; // used: 0 empty, 1 occupied
LinearProbeMap(int n) : keys(n), vals(n), used(n, 0) {}
void put(int key, int val) {
int n = keys.size(), i = ((key % n) + n) % n;
while (used[i] && keys[i] != key) i = (i + 1) % n; // probe forward
keys[i] = key; vals[i] = val; used[i] = 1;
}
bool get(int key, int& out) {
int n = keys.size(), i = ((key % n) + n) % n, steps = 0;
while (used[i] && steps++ < n) {
if (keys[i] == key) { out = vals[i]; return true; }
i = (i + 1) % n;
}
return false;
}
};
class LinearProbeMap {
int[] keys, vals; boolean[] used;
LinearProbeMap(int n) { keys = new int[n]; vals = new int[n]; used = new boolean[n]; }
void put(int key, int val) {
int n = keys.length, i = Math.floorMod(key, n);
while (used[i] && keys[i] != key) i = (i + 1) % n; // probe forward
keys[i] = key; vals[i] = val; used[i] = true;
}
Integer get(int key) {
int n = keys.length, i = Math.floorMod(key, n), steps = 0;
while (used[i] && steps++ < n) {
if (keys[i] == key) return vals[i];
i = (i + 1) % n;
}
return null;
}
}
class LinearProbeMap {
constructor(n) {
this.keys = new Array(n);
this.vals = new Array(n);
this.used = new Array(n).fill(false);
}
put(key, val) {
const n = this.used.length;
let i = ((key % n) + n) % n;
while (this.used[i] && this.keys[i] !== key) i = (i + 1) % n; // probe forward
this.keys[i] = key; this.vals[i] = val; this.used[i] = true;
}
get(key) {
const n = this.used.length;
let i = ((key % n) + n) % n, steps = 0;
while (this.used[i] && steps++ < n) {
if (this.keys[i] === key) return this.vals[i];
i = (i + 1) % n;
}
return undefined;
}
}
class LinearProbeMap:
def __init__(self, n):
self.keys = [None] * n
self.vals = [None] * n
def put(self, key, val):
n = len(self.keys)
i = key % n
while self.keys[i] is not None and self.keys[i] != key:
i = (i + 1) % n # probe forward
self.keys[i] = key
self.vals[i] = val
def get(self, key):
n = len(self.keys)
i = key % n
steps = 0
while self.keys[i] is not None and steps < n:
if self.keys[i] == key:
return self.vals[i]
i = (i + 1) % n
steps += 1
return None
Open addressing is cache-friendly (everything is in one contiguous array, no pointer chasing) and uses less memory per entry. Its weakness is clustering: runs of occupied slots grow and slow down probes, and it must stay well below full (resize around α ≈ 0.7). Python’s dict and set use open addressing.
Deletion gotcha: In open addressing you cannot just blank a slot — that would break the probe chain for keys stored after it. Use a tombstone marker (a “deleted” flag) so lookups keep probing past it.
Chaining vs open addressing
| Aspect | Separate chaining | Open addressing |
|---|---|---|
| Where collisions go | Side list per bucket | Other slots in the array |
| Memory | Extra per-node pointers | Compact array |
| Cache behavior | Worse (pointer chasing) | Better (contiguous) |
| Tolerates high load? | Yes (degrades gently) | No (must keep α low) |
| Deletion | Easy (remove from list) | Needs tombstones |
| Used by | Java HashMap, C++ unordered_map | Python dict, open-addressed sets |
Complexity summary
Both strategies are O(1) average for insert, lookup, and delete, and O(n) worst case when everything collides. Keeping a good hash function and a low load factor is what makes the average case the case you actually get.
For beginners: You rarely implement these yourself — you use the built-in map. But knowing why a hash table can occasionally be slow (bad hash, high load, pathological keys) helps you debug performance and answer interview questions.
Next: put theory to work with the built-in maps in four languages.