KMP String Matching Algorithm Explained

The Knuth-Morris-Pratt (KMP) algorithm finds every occurrence of a pattern of length m inside a text of length n in O(n + m) time — without ever re-examining a character of the text. Naive matching can degrade to O(n·m) because, after a mismatch, it slides the pattern by one and starts over. KMP instead precomputes, for the pattern alone, how far it can safely jump on a mismatch, so the text pointer never moves backward.

The prefix function (failure table)

The heart of KMP is the prefix function lps (longest proper prefix that is also a suffix). For each position i of the pattern, lps[i] is the length of the longest substring ending at i that is also a prefix of the pattern. When a mismatch happens after matching k characters, we already know the next lps[k-1] characters match — so we resume from there instead of from zero.

For pattern "ababaca" the table is [0,0,1,2,3,0,1].

std::vector<int> buildLPS(const std::string& p) {
    int m = p.size();
    std::vector<int> lps(m, 0);
    int len = 0;                 // length of current prefix-suffix match
    for (int i = 1; i < m; ) {
        if (p[i] == p[len]) lps[i++] = ++len;
        else if (len > 0)   len = lps[len - 1];   // fall back
        else                lps[i++] = 0;
    }
    return lps;
}

int[] buildLPS(String p) {
    int m = p.length();
    int[] lps = new int[m];
    int len = 0;                 // length of current prefix-suffix match
    for (int i = 1; i < m; ) {
        if (p.charAt(i) == p.charAt(len)) lps[i++] = ++len;
        else if (len > 0)                 len = lps[len - 1];   // fall back
        else                              lps[i++] = 0;
    }
    return lps;
}

function buildLPS(p) {
  const m = p.length;
  const lps = new Array(m).fill(0);
  let len = 0;                   // length of current prefix-suffix match
  for (let i = 1; i < m; ) {
    if (p[i] === p[len]) lps[i++] = ++len;
    else if (len > 0)    len = lps[len - 1];     // fall back
    else                 lps[i++] = 0;
  }
  return lps;
}

def build_lps(p):
    m = len(p)
    lps = [0] * m
    length = 0                   # length of current prefix-suffix match
    i = 1
    while i < m:
        if p[i] == p[length]:
            length += 1
            lps[i] = length
            i += 1
        elif length > 0:
            length = lps[length - 1]             # fall back
        else:
            lps[i] = 0
            i += 1
    return lps

For beginners: “Proper prefix” means a prefix that is not the whole string. The table answers: “if I fail here, what’s the longest already-matched chunk I can keep?”

Searching the text

With the table built, scan the text with two pointers. On a match, advance both. On a mismatch, slide the pattern using lps instead of resetting — the text pointer i never goes backward, giving the linear bound.

std::vector<int> kmpSearch(const std::string& text, const std::string& pat) {
    std::vector<int> res, lps = buildLPS(pat);
    int j = 0;                                   // matched length in pat
    for (int i = 0; i < (int)text.size(); i++) {
        while (j > 0 && text[i] != pat[j]) j = lps[j - 1];
        if (text[i] == pat[j]) j++;
        if (j == (int)pat.size()) {              // full match
            res.push_back(i - j + 1);
            j = lps[j - 1];
        }
    }
    return res;                                  // start indices of matches
}

List<Integer> kmpSearch(String text, String pat) {
    List<Integer> res = new ArrayList<>();
    int[] lps = buildLPS(pat);
    int j = 0;                                   // matched length in pat
    for (int i = 0; i < text.length(); i++) {
        while (j > 0 && text.charAt(i) != pat.charAt(j)) j = lps[j - 1];
        if (text.charAt(i) == pat.charAt(j)) j++;
        if (j == pat.length()) {                 // full match
            res.add(i - j + 1);
            j = lps[j - 1];
        }
    }
    return res;                                  // start indices of matches
}

function kmpSearch(text, pat) {
  const res = [], lps = buildLPS(pat);
  let j = 0;                                     // matched length in pat
  for (let i = 0; i < text.length; i++) {
    while (j > 0 && text[i] !== pat[j]) j = lps[j - 1];
    if (text[i] === pat[j]) j++;
    if (j === pat.length) {                      // full match
      res.push(i - j + 1);
      j = lps[j - 1];
    }
  }
  return res;                                    // start indices of matches
}

def kmp_search(text, pat):
    res = []
    lps = build_lps(pat)
    j = 0                                         # matched length in pat
    for i in range(len(text)):
        while j > 0 and text[i] != pat[j]:
            j = lps[j - 1]
        if text[i] == pat[j]:
            j += 1
        if j == len(pat):                         # full match
            res.append(i - j + 1)
            j = lps[j - 1]
    return res                                    # start indices of matches

Complexity

Step	Time	Space
Build prefix function	O(m)	O(m)
Search	O(n)	O(1) extra
Total	O(n + m)	O(m)

Common pitfalls

Resetting j to 0 on mismatch. That’s the naive method. Fall back via j = lps[j - 1] to keep it linear.
Empty pattern. Decide a policy (often “matches at every index”); guard against indexing pat[0] when m = 0.
Continuing after a full match. Set j = lps[j - 1] (not 0) to find overlapping occurrences correctly.
Mixing up prefix vs suffix in the table — it is the longest proper prefix that is also a suffix.

When to use it

KMP shines when you must scan a large text for a fixed pattern in guaranteed linear time, when worst-case performance matters (untrusted input), or for the strStr/indexOf interview question. The prefix function itself is independently useful — it solves “shortest repeated unit of a string” and related problems. For multiple patterns at once, look at Aho-Corasick; for prefix lookups across many strings, a trie.

Practice on the advanced exercises.