Choosing the Right Data Structure: A Decision Guide

Choosing the right data structure means matching the operations a problem needs most to the structure that makes those operations cheap. Get this choice right and the algorithm almost writes itself; get it wrong and you fight O(n) scans where an O(1) lookup was possible. This page gives you a signal-to-structure decision guide, a comparison table, and a worked example that flips between two reasonable choices.

Start from the operations, not the structure

Before picking anything, list what the problem does most often. Ask:

Do I look things up by key? → hash table.
Do I need elements in sorted order, or query ranges? → balanced tree / sorted array.
Do I always need the smallest or largest next? → heap (priority queue).
Do I add/remove only at the ends (LIFO/FIFO)? → stack / queue.
Do I insert/delete in the middle constantly? → linked list.
Do I index by position and mostly read/append? → array.

The structure that makes your hottest operation O(1) or O(log n) usually wins, even if it makes a rarer operation slower.

Map problem signals to structures

Signal in the problem	Reach for	Why
”Have I seen this before?” / dedupe / count	hash set / hash map	O(1) average lookup & insert
”Top-K”, “next smallest”, scheduling	heap / priority queue	O(log n) push/pop of the extreme
Keep sorted, range queries, floor/ceil	BST / balanced tree	O(log n) ordered ops
Match pairs, undo, nesting, recursion	stack	LIFO, O(1) ends
FIFO processing, BFS layers	queue / deque	O(1) ends
Random access by index, append-heavy	array	O(1) index, cache-friendly
Many mid-list insert/delete, no indexing	linked list	O(1) splice given the node
Prefix / autocomplete on strings	trie	O(L) by word length, not count
Connectivity / grouping	union-find	near-O(1) union & find

For beginners: When unsure, default to an array for ordered/indexed data and a hash map for “look up by key.” Those two solve a huge share of problems before you ever need anything fancier.

A switchable example: two valid choices

Suppose you must report whether a stream of integers contains any duplicate. Two reasonable structures: a sorted array (sort, then scan neighbors) or a hash set (insert, check on the way). Here is the hash-set choice — O(n) time, O(n) space, and a single pass:

#include <unordered_set>
#include <vector>
bool hasDuplicate(const std::vector<int>& a) {
    std::unordered_set<int> seen;
    for (int x : a) {
        if (seen.count(x)) return true; // O(1) average
        seen.insert(x);
    }
    return false;
}

import java.util.HashSet;
import java.util.Set;
boolean hasDuplicate(int[] a) {
    Set<Integer> seen = new HashSet<>();
    for (int x : a) {
        if (!seen.add(x)) return true; // add returns false if present
    }
    return false;
}

function hasDuplicate(a) {
  const seen = new Set();
  for (const x of a) {
    if (seen.has(x)) return true; // O(1) average
    seen.add(x);
  }
  return false;
}

def has_duplicate(a):
    seen = set()
    for x in a:
        if x in seen:      # O(1) average
            return True
        seen.add(x)
    return False

The sorted-array alternative would be sort(a) then check a[i] == a[i+1] — only O(1) extra space but O(n log n) time, and it destroys the original order. That captures the core trade-off: the hash set buys speed with memory; the sort saves memory but costs time and mutates the input. Pick based on which constraint the problem actually pins down.

Going deeper: Constraints are hints. “n ≤ 10⁵, values up to 10⁹” rules out a counting array (too sparse) and points at a hash set. “Already sorted input” makes the array approach free. Always re-read the constraints before committing — see the problem-solving framework.

A quick decision checklist

What is the single most frequent operation? Optimize that.
Is order required, or is a set/map enough? Don’t pay for ordering you won’t use.
What do the constraints allow (n, value range, memory)?
Can a simpler structure with extra memory replace a clever one? Usually yes.

Tip: Two data structures together often beat one. A hash map plus a doubly linked list gives an LRU cache; a heap plus a hash map gives a fast “decrease-key” priority queue.

Once you’ve chosen well, the next job is writing it cleanly — continue to writing clean solution code.