Skip to content
DSA projects 8 min read

Project: Build a Mini Database Index

In this project you’ll build a mini database index — the structure a database uses to find rows fast without scanning the whole table. The big design decision mirrors real databases: hashing gives O(1) lookups by exact key but can’t answer “give me everything between 10 and 20”, while an ordered structure gives O(log n) lookups and fast range queries. We’ll build the ordered index, because range queries are what make an index interesting. See Hashing: Introduction and Self-Balancing BSTs Overview.

What you’ll build

A key-value index with three operations:

  • put(key, value) — insert a new key or update an existing one.
  • get(key) — return the value for an exact key (a point lookup).
  • range(lo, hi) — return every value whose key falls in [lo, hi], in key order.

Hashing vs ordered: pick by the queries

NeedHash indexOrdered index
point lookup get(k)O(1) averageO(log n)
range query range(lo, hi)not supportedO(log n + k)
sorted iterationnoyes

A hash table scatters keys to make exact lookups instant, but that scattering destroys order — so range scans are impossible without checking every entry. Production databases use B-trees (a self-balancing ordered structure) precisely because most queries are ranges: dates, prices, IDs greater than X.

For beginners: A “range query” is any WHERE age BETWEEN 18 AND 25. Because an ordered index keeps keys sorted, you binary-search to the low end once, then walk forward until you pass the high end — no rescanning.

The design

We keep two parallel arrays — sorted keys and matching values — and drive everything with binary search (lowerBound: the first index whose key is >= target).

  • put: binary-search the insertion point. If the key is already there, overwrite its value; otherwise insert the key and value at that index (keeping both arrays sorted).
  • get: binary-search; if the key sits at that index, return its value, else report “not found”.
  • range: binary-search to the first key >= lo, then walk forward collecting values until a key exceeds hi.
#include <vector>
#include <string>
#include <algorithm>

// Ordered key->value index: keys kept sorted for point and range queries.
class OrderedIndex {
    std::vector<int> keys;
    std::vector<std::string> values;
public:
    // Insert or update. O(log n) to locate, O(n) to shift.
    void put(int key, const std::string& value) {
        int i = std::lower_bound(keys.begin(), keys.end(), key) - keys.begin();
        if (i < (int)keys.size() && keys[i] == key) {
            values[i] = value;                       // update
        } else {
            keys.insert(keys.begin() + i, key);
            values.insert(values.begin() + i, value);
        }
    }

    // Point lookup. O(log n). Returns false if absent.
    bool get(int key, std::string& out) const {
        int i = std::lower_bound(keys.begin(), keys.end(), key) - keys.begin();
        if (i < (int)keys.size() && keys[i] == key) {
            out = values[i];
            return true;
        }
        return false;
    }

    // Range scan: values with lo <= key <= hi, in key order. O(log n + k).
    std::vector<std::string> range(int lo, int hi) const {
        std::vector<std::string> out;
        int i = std::lower_bound(keys.begin(), keys.end(), lo) - keys.begin();
        for (; i < (int)keys.size() && keys[i] <= hi; i++)
            out.push_back(values[i]);
        return out;
    }
};

Trying it out

Insert (50,"Ada"), (20,"Lin"), (80,"Tom"), (35,"Mei"). Now get(20) returns "Lin", get(99) reports “not found”, and range(30, 60) returns ["Mei", "Ada"] — already sorted by key, the whole point of an ordered index.

Gotcha: put is O(n) here because inserting into a sorted array shifts later elements. That’s fine for a learning project and for read-heavy indexes, but it’s the exact weakness a B-tree or balanced BST removes by making inserts O(log n) too.

Complexity

  • get: O(log n) — binary search.
  • range: O(log n + k) for k results.
  • put: O(log n) to locate + O(n) to shift; O(n) overall.
  • space: O(n).

Going deeper: Swap the sorted arrays for a balanced BST or B-tree and every operation, including put, becomes O(log n) while keeping ordered iteration — that’s how real database indexes scale. If you only ever do point lookups, drop the order entirely and use hashing for O(1) access. Choosing between them is the essence of picking the right data structure.

That’s the last project — head back to the projects overview to pick your extension challenges.

Last updated June 25, 2026
Was this helpful?