Knuth Morris Pratt Algorithm for pattern matching in linear time

Preet Kamal
3 min readOct 9, 2021

One of the most elegant and impressive string algorithm which I have seen is KMP algorithm. Its beautiful. The idea of matching a pattern in linear time amazes me and it will amaze you too.

Lets start:

Consider 2 strings, s and p, where is the main string and p is the pattern to be searched.

-----0 1 2 3 4 5 6 7 8 9 10 11 12 13
s = “a d s x d s a x d s x d s g”
p = “d s x d s g”
Lets start comparing s and p starting from 0th index taking i and j as two pointers.i j s[i] p[j] matched?
-----------------------------------------
0 0 a d no
1 0 d d yes
2 1 s s yes
3 2 x x yes
4 3 d d yes
5 4 s s yes
6 5 a g no
1 0 d d yes
2 1 s s yes
3 2 x x yes
4 3 d d yes
5 4 s s yes
6 5 a g no
2 0 s d no
. . . . .
. . . . .
. . . . .
8 0 d d yes
9 1 s s yes
10 2 x x yes
11 3 d d yes
12 4 s s yes
13 5 g g yes

Since j == len(p)-1. So that means we found the pattern and we can stop. But we can see at i = 6 and j = 5, when the characters doesn’t match, it start from j = 0 and throw away all the comparisons we have done.

What KMP does is take advantage of the those previous comparison by preparing a prefix suffix table. It is a table used for checking if the last suffix we compared, was it also a suffix and can be used to skip comparisons.

For ex->

after i = 6 and j = 5, we can see rather than starting from j = 0 and going back to i = 1 we can start from i = 6 and j = 2. Why? Because we already know that the last two characters matched were ds and the pattern also starts from ds. So we can skip ds matching from the patter and go directly to k=2 that is x. It can be a bit confusing so give it some time and think how e are using the fact that the pattern start from ds and the last two characters matched are also ds, so we can skip matching ds again and go to next character that is x.

How will we create a prefix, suffix table. Using this piece of code. Its pretty simple. Just some normal comparisons.

pref_suff = [0] * len(p)
i, j = 0, 1
while i < len(p) and j < len(p):
if p[i] == p[j]:
pref_suff[j] = i+1
i += 1
j += 1
else:
pref_suff[j] = 0
j += 1
What this does is, compares each iₜₕ to jₜₕ character and keeps on increasing the pointers by 1 and assigning the index+1 poistion to table on successful comparisons and on failed comparison, we start the pointer j from 0.

Now the final KMP code will look something like this in python.

s = input("First string: ")
p = input("Second string: ")
pref_suff = [0] * len(p)
i, j = 0, 1
while i < len(p) and j < len(p):
if p[i] == p[j]:
pref_suff[j] = i+1
i += 1
j += 1
else:
pref_suff[j] = 0
j += 1
exists = False
i, j = -1, 0
while i < len(s):
if s[i] == p[j]:
j += 1
else:
j = pref_suff[j]
if j == len(p):
exists = True
break
i += 1
print(exists)

First it creates a prefix suffix table which has the indices to start when failed comparisons are made. And the it starts the pattern matching.

This works in O(p+s) time complexity and space complexity is O(p).

It was a bit difficult for me too so explanation may lag somewhere. I will keep on updating it if I find something with a better explanation. Please leave your questions. Thanks!

--

--