Hash Sets: Fast Membership and Deduplication
A hash set is a hash table that stores only keys, no values — a collection of unique elements with O(1) average add, remove, and “is this in the set?” membership tests. It is the go-to tool for two everyday jobs: checking whether you have seen something before, and removing duplicates. The built-ins are C++ std::unordered_set, Java HashSet, JavaScript Set, and Python set.
The core operations
#include <unordered_set>
std::unordered_set<int> s;
s.insert(7); // add — O(1) average
bool has = s.count(7); // contains? — O(1) average (1 if present)
s.erase(7); // remove — O(1) average
size_t n = s.size();
import java.util.HashSet;
import java.util.Set;
Set<Integer> s = new HashSet<>();
s.add(7); // add — O(1) average
boolean has = s.contains(7); // contains? — O(1) average
s.remove(7); // remove — O(1) average
int n = s.size();
const s = new Set();
s.add(7); // add — O(1) average
const has = s.has(7); // contains? — O(1) average
s.delete(7); // remove — O(1) average
const n = s.size; // property, not a method
s = set()
s.add(7) # add — O(1) average
has = 7 in s # contains? — O(1) average
s.discard(7) # remove safely (no error if absent)
n = len(s)
Deduplicate a list
Building a set from a collection drops duplicates instantly. Remember: the result is unordered, so do not rely on element order.
#include <unordered_set>
#include <vector>
std::vector<int> dedup(const std::vector<int>& a) {
std::unordered_set<int> seen(a.begin(), a.end());
return std::vector<int>(seen.begin(), seen.end()); // order not guaranteed
}
import java.util.*;
List<Integer> dedup(List<Integer> a) {
Set<Integer> seen = new HashSet<>(a);
return new ArrayList<>(seen); // order not guaranteed
}
function dedup(a) {
return [...new Set(a)]; // preserves first-seen order in JS
}
def dedup(a):
return list(set(a)) # order not guaranteed
”Have I seen this before?” — detect duplicates
A set turns the O(n²) nested-loop duplicate check (see Big-O notation) into a single O(n) pass: add as you go, and the first time add/insert fails — or the value is already present — you found a repeat.
#include <unordered_set>
#include <vector>
bool hasDuplicate(const std::vector<int>& a) {
std::unordered_set<int> seen;
for (int x : a) {
if (seen.count(x)) return true; // seen before
seen.insert(x);
}
return false;
}
import java.util.*;
boolean hasDuplicate(int[] a) {
Set<Integer> seen = new HashSet<>();
for (int x : a) {
if (!seen.add(x)) return true; // add returns false if already present
}
return false;
}
function hasDuplicate(a) {
const seen = new Set();
for (const x of a) {
if (seen.has(x)) return true; // seen before
seen.add(x);
}
return false;
}
def has_duplicate(a):
seen = set()
for x in a:
if x in seen:
return True
seen.add(x)
return False
Set algebra: union, intersection, difference
Sets shine for comparing collections — “what’s in both?”, “what’s in A but not B?”.
#include <unordered_set>
#include <vector>
// Intersection: elements present in both sets.
std::vector<int> intersect(const std::unordered_set<int>& a,
const std::unordered_set<int>& b) {
std::vector<int> out;
const auto& small = a.size() < b.size() ? a : b;
const auto& big = a.size() < b.size() ? b : a;
for (int x : small) if (big.count(x)) out.push_back(x);
return out;
}
import java.util.*;
Set<Integer> intersect(Set<Integer> a, Set<Integer> b) {
Set<Integer> out = new HashSet<>(a);
out.retainAll(b); // keep only elements also in b
return out;
}
function intersect(a, b) {
const out = new Set();
for (const x of a) if (b.has(x)) out.add(x);
return out;
}
def intersect(a, b):
return a & b # also: a | b (union), a - b (difference)
For beginners: A set is just a map whose values you ignore. If you ever need to attach data to each unique key (a count, a position, a name), switch from a set to a hash map.
Going deeper: Need the elements in sorted order or range queries? Use an ordered set — C++
std::set/ JavaTreeSet(balanced trees, O(log n)). JS and Python have no built-in sorted set; sort on demand or use a library. A hash set trades ordering for speed.
Complexity at a glance
| Operation | Average | Worst case |
|---|---|---|
| add / remove / contains | O(1) | O(n) |
| build set from n items | O(n) | O(n²) |
| union / intersection / difference | O(n) | O(n²) |
Next: put maps and sets to work on real problems — hashing applications & patterns.