Project: Build a Mini Database Index
In this project you’ll build a mini database index — the structure a database uses to find rows fast without scanning the whole table. The big design decision mirrors real databases: hashing gives O(1) lookups by exact key but can’t answer “give me everything between 10 and 20”, while an ordered structure gives O(log n) lookups and fast range queries. We’ll build the ordered index, because range queries are what make an index interesting. See Hashing: Introduction and Self-Balancing BSTs Overview.
What you’ll build
A key-value index with three operations:
put(key, value)— insert a new key or update an existing one.get(key)— return the value for an exact key (a point lookup).range(lo, hi)— return every value whose key falls in[lo, hi], in key order.
Hashing vs ordered: pick by the queries
| Need | Hash index | Ordered index |
|---|---|---|
point lookup get(k) | O(1) average | O(log n) |
range query range(lo, hi) | not supported | O(log n + k) |
| sorted iteration | no | yes |
A hash table scatters keys to make exact lookups instant, but that scattering destroys order — so range scans are impossible without checking every entry. Production databases use B-trees (a self-balancing ordered structure) precisely because most queries are ranges: dates, prices, IDs greater than X.
For beginners: A “range query” is any
WHERE age BETWEEN 18 AND 25. Because an ordered index keeps keys sorted, you binary-search to the low end once, then walk forward until you pass the high end — no rescanning.
The design
We keep two parallel arrays — sorted keys and matching values — and drive everything with binary search (lowerBound: the first index whose key is >= target).
put: binary-search the insertion point. If the key is already there, overwrite its value; otherwise insert the key and value at that index (keeping both arrays sorted).get: binary-search; if the key sits at that index, return its value, else report “not found”.range: binary-search to the first key>= lo, then walk forward collecting values until a key exceedshi.
#include <vector>
#include <string>
#include <algorithm>
// Ordered key->value index: keys kept sorted for point and range queries.
class OrderedIndex {
std::vector<int> keys;
std::vector<std::string> values;
public:
// Insert or update. O(log n) to locate, O(n) to shift.
void put(int key, const std::string& value) {
int i = std::lower_bound(keys.begin(), keys.end(), key) - keys.begin();
if (i < (int)keys.size() && keys[i] == key) {
values[i] = value; // update
} else {
keys.insert(keys.begin() + i, key);
values.insert(values.begin() + i, value);
}
}
// Point lookup. O(log n). Returns false if absent.
bool get(int key, std::string& out) const {
int i = std::lower_bound(keys.begin(), keys.end(), key) - keys.begin();
if (i < (int)keys.size() && keys[i] == key) {
out = values[i];
return true;
}
return false;
}
// Range scan: values with lo <= key <= hi, in key order. O(log n + k).
std::vector<std::string> range(int lo, int hi) const {
std::vector<std::string> out;
int i = std::lower_bound(keys.begin(), keys.end(), lo) - keys.begin();
for (; i < (int)keys.size() && keys[i] <= hi; i++)
out.push_back(values[i]);
return out;
}
};
import java.util.*;
// Ordered key->value index: keys kept sorted for point and range queries.
class OrderedIndex {
private final List<Integer> keys = new ArrayList<>();
private final List<String> values = new ArrayList<>();
// First index i with keys[i] >= key (binary search).
private int lowerBound(int key) {
int lo = 0, hi = keys.size();
while (lo < hi) {
int mid = (lo + hi) >>> 1;
if (keys.get(mid) < key) lo = mid + 1;
else hi = mid;
}
return lo;
}
// Insert or update. O(log n) to locate, O(n) to shift.
void put(int key, String value) {
int i = lowerBound(key);
if (i < keys.size() && keys.get(i) == key) {
values.set(i, value); // update
} else {
keys.add(i, key);
values.add(i, value);
}
}
// Point lookup. O(log n). Returns null if absent.
String get(int key) {
int i = lowerBound(key);
if (i < keys.size() && keys.get(i) == key) return values.get(i);
return null;
}
// Range scan: values with lo <= key <= hi, in key order. O(log n + k).
List<String> range(int lo, int hi) {
List<String> out = new ArrayList<>();
for (int i = lowerBound(lo); i < keys.size() && keys.get(i) <= hi; i++)
out.add(values.get(i));
return out;
}
}
// Ordered key->value index: keys kept sorted for point and range queries.
class OrderedIndex {
constructor() {
this.keys = [];
this.values = [];
}
// First index i with keys[i] >= key (binary search).
#lowerBound(key) {
let lo = 0, hi = this.keys.length;
while (lo < hi) {
const mid = (lo + hi) >> 1;
if (this.keys[mid] < key) lo = mid + 1;
else hi = mid;
}
return lo;
}
// Insert or update. O(log n) to locate, O(n) to shift.
put(key, value) {
const i = this.#lowerBound(key);
if (i < this.keys.length && this.keys[i] === key) {
this.values[i] = value; // update
} else {
this.keys.splice(i, 0, key);
this.values.splice(i, 0, value);
}
}
// Point lookup. O(log n). Returns undefined if absent.
get(key) {
const i = this.#lowerBound(key);
if (i < this.keys.length && this.keys[i] === key) return this.values[i];
return undefined;
}
// Range scan: values with lo <= key <= hi, in key order. O(log n + k).
range(lo, hi) {
const out = [];
for (let i = this.#lowerBound(lo); i < this.keys.length && this.keys[i] <= hi; i++)
out.push(this.values[i]);
return out;
}
}
import bisect
# Ordered key->value index: keys kept sorted for point and range queries.
class OrderedIndex:
def __init__(self):
self.keys = []
self.values = []
# Insert or update. O(log n) to locate, O(n) to shift.
def put(self, key, value):
i = bisect.bisect_left(self.keys, key)
if i < len(self.keys) and self.keys[i] == key:
self.values[i] = value # update
else:
self.keys.insert(i, key)
self.values.insert(i, value)
# Point lookup. O(log n). Returns None if absent.
def get(self, key):
i = bisect.bisect_left(self.keys, key)
if i < len(self.keys) and self.keys[i] == key:
return self.values[i]
return None
# Range scan: values with lo <= key <= hi, in key order. O(log n + k).
def range(self, lo, hi):
out = []
i = bisect.bisect_left(self.keys, lo)
while i < len(self.keys) and self.keys[i] <= hi:
out.append(self.values[i])
i += 1
return out
Trying it out
Insert (50,"Ada"), (20,"Lin"), (80,"Tom"), (35,"Mei"). Now get(20) returns "Lin", get(99) reports “not found”, and range(30, 60) returns ["Mei", "Ada"] — already sorted by key, the whole point of an ordered index.
Gotcha:
putis O(n) here because inserting into a sorted array shifts later elements. That’s fine for a learning project and for read-heavy indexes, but it’s the exact weakness a B-tree or balanced BST removes by making inserts O(log n) too.
Complexity
- get: O(log n) — binary search.
- range: O(log n + k) for k results.
- put: O(log n) to locate + O(n) to shift; O(n) overall.
- space: O(n).
Going deeper: Swap the sorted arrays for a balanced BST or B-tree and every operation, including
put, becomes O(log n) while keeping ordered iteration — that’s how real database indexes scale. If you only ever do point lookups, drop the order entirely and use hashing for O(1) access. Choosing between them is the essence of picking the right data structure.
That’s the last project — head back to the projects overview to pick your extension challenges.