KMP String Matching Algorithm Explained
The Knuth-Morris-Pratt (KMP) algorithm finds every occurrence of a pattern of length m inside a text of length n in O(n + m) time — without ever re-examining a character of the text. Naive matching can degrade to O(n·m) because, after a mismatch, it slides the pattern by one and starts over. KMP instead precomputes, for the pattern alone, how far it can safely jump on a mismatch, so the text pointer never moves backward.
The prefix function (failure table)
The heart of KMP is the prefix function lps (longest proper prefix that is also a suffix). For each position i of the pattern, lps[i] is the length of the longest substring ending at i that is also a prefix of the pattern. When a mismatch happens after matching k characters, we already know the next lps[k-1] characters match — so we resume from there instead of from zero.
For pattern "ababaca" the table is [0,0,1,2,3,0,1].
std::vector<int> buildLPS(const std::string& p) {
int m = p.size();
std::vector<int> lps(m, 0);
int len = 0; // length of current prefix-suffix match
for (int i = 1; i < m; ) {
if (p[i] == p[len]) lps[i++] = ++len;
else if (len > 0) len = lps[len - 1]; // fall back
else lps[i++] = 0;
}
return lps;
}
int[] buildLPS(String p) {
int m = p.length();
int[] lps = new int[m];
int len = 0; // length of current prefix-suffix match
for (int i = 1; i < m; ) {
if (p.charAt(i) == p.charAt(len)) lps[i++] = ++len;
else if (len > 0) len = lps[len - 1]; // fall back
else lps[i++] = 0;
}
return lps;
}
function buildLPS(p) {
const m = p.length;
const lps = new Array(m).fill(0);
let len = 0; // length of current prefix-suffix match
for (let i = 1; i < m; ) {
if (p[i] === p[len]) lps[i++] = ++len;
else if (len > 0) len = lps[len - 1]; // fall back
else lps[i++] = 0;
}
return lps;
}
def build_lps(p):
m = len(p)
lps = [0] * m
length = 0 # length of current prefix-suffix match
i = 1
while i < m:
if p[i] == p[length]:
length += 1
lps[i] = length
i += 1
elif length > 0:
length = lps[length - 1] # fall back
else:
lps[i] = 0
i += 1
return lps
For beginners: “Proper prefix” means a prefix that is not the whole string. The table answers: “if I fail here, what’s the longest already-matched chunk I can keep?”
Searching the text
With the table built, scan the text with two pointers. On a match, advance both. On a mismatch, slide the pattern using lps instead of resetting — the text pointer i never goes backward, giving the linear bound.
std::vector<int> kmpSearch(const std::string& text, const std::string& pat) {
std::vector<int> res, lps = buildLPS(pat);
int j = 0; // matched length in pat
for (int i = 0; i < (int)text.size(); i++) {
while (j > 0 && text[i] != pat[j]) j = lps[j - 1];
if (text[i] == pat[j]) j++;
if (j == (int)pat.size()) { // full match
res.push_back(i - j + 1);
j = lps[j - 1];
}
}
return res; // start indices of matches
}
List<Integer> kmpSearch(String text, String pat) {
List<Integer> res = new ArrayList<>();
int[] lps = buildLPS(pat);
int j = 0; // matched length in pat
for (int i = 0; i < text.length(); i++) {
while (j > 0 && text.charAt(i) != pat.charAt(j)) j = lps[j - 1];
if (text.charAt(i) == pat.charAt(j)) j++;
if (j == pat.length()) { // full match
res.add(i - j + 1);
j = lps[j - 1];
}
}
return res; // start indices of matches
}
function kmpSearch(text, pat) {
const res = [], lps = buildLPS(pat);
let j = 0; // matched length in pat
for (let i = 0; i < text.length; i++) {
while (j > 0 && text[i] !== pat[j]) j = lps[j - 1];
if (text[i] === pat[j]) j++;
if (j === pat.length) { // full match
res.push(i - j + 1);
j = lps[j - 1];
}
}
return res; // start indices of matches
}
def kmp_search(text, pat):
res = []
lps = build_lps(pat)
j = 0 # matched length in pat
for i in range(len(text)):
while j > 0 and text[i] != pat[j]:
j = lps[j - 1]
if text[i] == pat[j]:
j += 1
if j == len(pat): # full match
res.append(i - j + 1)
j = lps[j - 1]
return res # start indices of matches
Complexity
| Step | Time | Space |
|---|---|---|
| Build prefix function | O(m) | O(m) |
| Search | O(n) | O(1) extra |
| Total | O(n + m) | O(m) |
Common pitfalls
- Resetting
jto 0 on mismatch. That’s the naive method. Fall back viaj = lps[j - 1]to keep it linear. - Empty pattern. Decide a policy (often “matches at every index”); guard against indexing
pat[0]whenm = 0. - Continuing after a full match. Set
j = lps[j - 1](not 0) to find overlapping occurrences correctly. - Mixing up prefix vs suffix in the table — it is the longest proper prefix that is also a suffix.
When to use it
KMP shines when you must scan a large text for a fixed pattern in guaranteed linear time, when worst-case performance matters (untrusted input), or for the strStr/indexOf interview question. The prefix function itself is independently useful — it solves “shortest repeated unit of a string” and related problems. For multiple patterns at once, look at Aho-Corasick; for prefix lookups across many strings, a trie.
Practice on the advanced exercises.