Choosing the Right Data Structure: A Decision Guide
Choosing the right data structure means matching the operations a problem needs most to the structure that makes those operations cheap. Get this choice right and the algorithm almost writes itself; get it wrong and you fight O(n) scans where an O(1) lookup was possible. This page gives you a signal-to-structure decision guide, a comparison table, and a worked example that flips between two reasonable choices.
Start from the operations, not the structure
Before picking anything, list what the problem does most often. Ask:
- Do I look things up by key? → hash table.
- Do I need elements in sorted order, or query ranges? → balanced tree / sorted array.
- Do I always need the smallest or largest next? → heap (priority queue).
- Do I add/remove only at the ends (LIFO/FIFO)? → stack / queue.
- Do I insert/delete in the middle constantly? → linked list.
- Do I index by position and mostly read/append? → array.
The structure that makes your hottest operation O(1) or O(log n) usually wins, even if it makes a rarer operation slower.
Map problem signals to structures
| Signal in the problem | Reach for | Why |
|---|---|---|
| ”Have I seen this before?” / dedupe / count | hash set / hash map | O(1) average lookup & insert |
| ”Top-K”, “next smallest”, scheduling | heap / priority queue | O(log n) push/pop of the extreme |
| Keep sorted, range queries, floor/ceil | BST / balanced tree | O(log n) ordered ops |
| Match pairs, undo, nesting, recursion | stack | LIFO, O(1) ends |
| FIFO processing, BFS layers | queue / deque | O(1) ends |
| Random access by index, append-heavy | array | O(1) index, cache-friendly |
| Many mid-list insert/delete, no indexing | linked list | O(1) splice given the node |
| Prefix / autocomplete on strings | trie | O(L) by word length, not count |
| Connectivity / grouping | union-find | near-O(1) union & find |
For beginners: When unsure, default to an array for ordered/indexed data and a hash map for “look up by key.” Those two solve a huge share of problems before you ever need anything fancier.
A switchable example: two valid choices
Suppose you must report whether a stream of integers contains any duplicate. Two reasonable structures: a sorted array (sort, then scan neighbors) or a hash set (insert, check on the way). Here is the hash-set choice — O(n) time, O(n) space, and a single pass:
#include <unordered_set>
#include <vector>
bool hasDuplicate(const std::vector<int>& a) {
std::unordered_set<int> seen;
for (int x : a) {
if (seen.count(x)) return true; // O(1) average
seen.insert(x);
}
return false;
}
import java.util.HashSet;
import java.util.Set;
boolean hasDuplicate(int[] a) {
Set<Integer> seen = new HashSet<>();
for (int x : a) {
if (!seen.add(x)) return true; // add returns false if present
}
return false;
}
function hasDuplicate(a) {
const seen = new Set();
for (const x of a) {
if (seen.has(x)) return true; // O(1) average
seen.add(x);
}
return false;
}
def has_duplicate(a):
seen = set()
for x in a:
if x in seen: # O(1) average
return True
seen.add(x)
return False
The sorted-array alternative would be sort(a) then check a[i] == a[i+1] — only O(1) extra space but O(n log n) time, and it destroys the original order. That captures the core trade-off: the hash set buys speed with memory; the sort saves memory but costs time and mutates the input. Pick based on which constraint the problem actually pins down.
Going deeper: Constraints are hints. “n ≤ 10⁵, values up to 10⁹” rules out a counting array (too sparse) and points at a hash set. “Already sorted input” makes the array approach free. Always re-read the constraints before committing — see the problem-solving framework.
A quick decision checklist
- What is the single most frequent operation? Optimize that.
- Is order required, or is a set/map enough? Don’t pay for ordering you won’t use.
- What do the constraints allow (n, value range, memory)?
- Can a simpler structure with extra memory replace a clever one? Usually yes.
Tip: Two data structures together often beat one. A hash map plus a doubly linked list gives an LRU cache; a heap plus a hash map gives a fast “decrease-key” priority queue.
Once you’ve chosen well, the next job is writing it cleanly — continue to writing clean solution code.