Skip to content
DSA advanced 9 min read

KMP String Matching Algorithm Explained

The Knuth-Morris-Pratt (KMP) algorithm finds every occurrence of a pattern of length m inside a text of length n in O(n + m) time — without ever re-examining a character of the text. Naive matching can degrade to O(n·m) because, after a mismatch, it slides the pattern by one and starts over. KMP instead precomputes, for the pattern alone, how far it can safely jump on a mismatch, so the text pointer never moves backward.

The prefix function (failure table)

The heart of KMP is the prefix function lps (longest proper prefix that is also a suffix). For each position i of the pattern, lps[i] is the length of the longest substring ending at i that is also a prefix of the pattern. When a mismatch happens after matching k characters, we already know the next lps[k-1] characters match — so we resume from there instead of from zero.

For pattern "ababaca" the table is [0,0,1,2,3,0,1].

std::vector<int> buildLPS(const std::string& p) {
    int m = p.size();
    std::vector<int> lps(m, 0);
    int len = 0;                 // length of current prefix-suffix match
    for (int i = 1; i < m; ) {
        if (p[i] == p[len]) lps[i++] = ++len;
        else if (len > 0)   len = lps[len - 1];   // fall back
        else                lps[i++] = 0;
    }
    return lps;
}

For beginners: “Proper prefix” means a prefix that is not the whole string. The table answers: “if I fail here, what’s the longest already-matched chunk I can keep?”

Searching the text

With the table built, scan the text with two pointers. On a match, advance both. On a mismatch, slide the pattern using lps instead of resetting — the text pointer i never goes backward, giving the linear bound.

std::vector<int> kmpSearch(const std::string& text, const std::string& pat) {
    std::vector<int> res, lps = buildLPS(pat);
    int j = 0;                                   // matched length in pat
    for (int i = 0; i < (int)text.size(); i++) {
        while (j > 0 && text[i] != pat[j]) j = lps[j - 1];
        if (text[i] == pat[j]) j++;
        if (j == (int)pat.size()) {              // full match
            res.push_back(i - j + 1);
            j = lps[j - 1];
        }
    }
    return res;                                  // start indices of matches
}

Complexity

StepTimeSpace
Build prefix functionO(m)O(m)
SearchO(n)O(1) extra
TotalO(n + m)O(m)

Common pitfalls

  • Resetting j to 0 on mismatch. That’s the naive method. Fall back via j = lps[j - 1] to keep it linear.
  • Empty pattern. Decide a policy (often “matches at every index”); guard against indexing pat[0] when m = 0.
  • Continuing after a full match. Set j = lps[j - 1] (not 0) to find overlapping occurrences correctly.
  • Mixing up prefix vs suffix in the table — it is the longest proper prefix that is also a suffix.

When to use it

KMP shines when you must scan a large text for a fixed pattern in guaranteed linear time, when worst-case performance matters (untrusted input), or for the strStr/indexOf interview question. The prefix function itself is independently useful — it solves “shortest repeated unit of a string” and related problems. For multiple patterns at once, look at Aho-Corasick; for prefix lookups across many strings, a trie.

Practice on the advanced exercises.

Last updated June 25, 2026
Was this helpful?