Hash Sets: Fast Membership and Deduplication

A hash set is a hash table that stores only keys, no values — a collection of unique elements with O(1) average add, remove, and “is this in the set?” membership tests. It is the go-to tool for two everyday jobs: checking whether you have seen something before, and removing duplicates. The built-ins are C++ std::unordered_set, Java HashSet, JavaScript Set, and Python set.

The core operations

#include <unordered_set>

std::unordered_set<int> s;
s.insert(7);            // add — O(1) average
bool has = s.count(7);  // contains? — O(1) average (1 if present)
s.erase(7);             // remove — O(1) average
size_t n = s.size();

import java.util.HashSet;
import java.util.Set;

Set<Integer> s = new HashSet<>();
s.add(7);                    // add — O(1) average
boolean has = s.contains(7); // contains? — O(1) average
s.remove(7);                 // remove — O(1) average
int n = s.size();

const s = new Set();
s.add(7);              // add — O(1) average
const has = s.has(7);  // contains? — O(1) average
s.delete(7);           // remove — O(1) average
const n = s.size;      // property, not a method

s = set()
s.add(7)        # add — O(1) average
has = 7 in s    # contains? — O(1) average
s.discard(7)    # remove safely (no error if absent)
n = len(s)

Deduplicate a list

Building a set from a collection drops duplicates instantly. Remember: the result is unordered, so do not rely on element order.

#include <unordered_set>
#include <vector>

std::vector<int> dedup(const std::vector<int>& a) {
    std::unordered_set<int> seen(a.begin(), a.end());
    return std::vector<int>(seen.begin(), seen.end()); // order not guaranteed
}

import java.util.*;

List<Integer> dedup(List<Integer> a) {
    Set<Integer> seen = new HashSet<>(a);
    return new ArrayList<>(seen); // order not guaranteed
}

function dedup(a) {
  return [...new Set(a)]; // preserves first-seen order in JS
}

def dedup(a):
    return list(set(a))  # order not guaranteed

”Have I seen this before?” — detect duplicates

A set turns the O(n²) nested-loop duplicate check (see Big-O notation) into a single O(n) pass: add as you go, and the first time add/insert fails — or the value is already present — you found a repeat.

#include <unordered_set>
#include <vector>

bool hasDuplicate(const std::vector<int>& a) {
    std::unordered_set<int> seen;
    for (int x : a) {
        if (seen.count(x)) return true; // seen before
        seen.insert(x);
    }
    return false;
}

import java.util.*;

boolean hasDuplicate(int[] a) {
    Set<Integer> seen = new HashSet<>();
    for (int x : a) {
        if (!seen.add(x)) return true; // add returns false if already present
    }
    return false;
}

function hasDuplicate(a) {
  const seen = new Set();
  for (const x of a) {
    if (seen.has(x)) return true; // seen before
    seen.add(x);
  }
  return false;
}

def has_duplicate(a):
    seen = set()
    for x in a:
        if x in seen:
            return True
        seen.add(x)
    return False

Set algebra: union, intersection, difference

Sets shine for comparing collections — “what’s in both?”, “what’s in A but not B?”.

#include <unordered_set>
#include <vector>

// Intersection: elements present in both sets.
std::vector<int> intersect(const std::unordered_set<int>& a,
                           const std::unordered_set<int>& b) {
    std::vector<int> out;
    const auto& small = a.size() < b.size() ? a : b;
    const auto& big   = a.size() < b.size() ? b : a;
    for (int x : small) if (big.count(x)) out.push_back(x);
    return out;
}

import java.util.*;

Set<Integer> intersect(Set<Integer> a, Set<Integer> b) {
    Set<Integer> out = new HashSet<>(a);
    out.retainAll(b);   // keep only elements also in b
    return out;
}

function intersect(a, b) {
  const out = new Set();
  for (const x of a) if (b.has(x)) out.add(x);
  return out;
}

def intersect(a, b):
    return a & b          # also: a | b (union), a - b (difference)

For beginners: A set is just a map whose values you ignore. If you ever need to attach data to each unique key (a count, a position, a name), switch from a set to a hash map.

Going deeper: Need the elements in sorted order or range queries? Use an ordered set — C++ std::set / Java TreeSet (balanced trees, O(log n)). JS and Python have no built-in sorted set; sort on demand or use a library. A hash set trades ordering for speed.

Complexity at a glance

Operation	Average	Worst case
add / remove / contains	O(1)	O(n)
build set from n items	O(n)	O(n²)
union / intersection / difference	O(n)	O(n²)

Next: put maps and sets to work on real problems — hashing applications & patterns.