Fujimoto Seiji
Version 2018.07.16

# Can we optimize the Wagner-Fischer algorithm?

Answer: Yes.

## 1. Overview

The purpose of this expository note is to explain several practical techniques to optimize the Wagner-Fischer algorithm (which is the most common algorithm for computing Levenshtein distance). A simple benchmark test was carried out to evaluate these optimizations techniques. The result suggests that these techniques, when all of them combined, can reduce the computation time by 20-40% in general cases.

## 2. Memory Usage Reduction

### 2.1. Overview

A naive implementation of WF allocates a two dimensional array of size (n + 1) x (m + 1), hence requires O(nm) space. However, there is an easy fix to reduce its memory requirements to O(m).

1. Allocate a single array of length m+1.
2. Initialize the array buffer with a series of integers {0, 1, 2, ..., m}, which represents the first row of the WF matrix.
3. Compute the second row of the matrix, while updating the buffer incrementally.
4. Repeat step 3 until you reach the last row.

In short, we compute the matrix in a row-by-row manner using a buffer only keeping the content of a single row. On the surface, it might appear that two array buffers are required for this task, but careful examination reveals that we can perform the whole computation with a single array of size (m + 1).

### 2.2. Step-by-step Example

Here is how it works. Suppose we are trying to compute the Levenshtein distance between 'ab' and 'cd', and for this task, we have allocated an array of size 3. In this case, we need to fill the 3 x 3 matrix below:

```    c d
. . .   buf = [0,0,0]
a . . .
b . . .
```

Let's solve this matrix row by row. The first row is the easiest part because it's just an increasing integer sequence from 0 to m. So just fill the buffer with it:

```    c d
0 1 2   buf[0] := 0
a . . .   buf[1] := 1
b . . .   buf[2] := 2
```

The tricky part starts here. For the second row, the value of the leftmost cell is always 1, but we cannot simply assign it to buf[0] or we lose the information required to compute the cell to its right. So first we save buf[0] to a temporary variable and then update buf[0]:

```    c d
0 1 2   tmp := buf[0]
a 1 . .   buf[0] := 1
b . . .
```

Now we can trivially compute the middle cell of the second row using the following variables:

```buf[0] = the cell to the left
buf[1] = the cell above
tmp    = the cell in the upper left
```

In other words:

```<the value of the middle cell> = min(buf[0], buf[1], tmp) + 1 = 1
```

Of course, we must not assign the result to buf[1] in a hurry; Again, this makes the next cell incomputable. So we need to save the value of buf[1] to tmp first, and then update buf[1]:

```    c d
0 1 2   cost := min(buf[0], buf[1], tmp) + 1
a 1 1 .   tmp := buf[1]
b . . .   buf[1] := cost
```

We can subsequently compute the last cell of the row. In this case, the value is

```<the value of the right cell> = min(buf[1], buf[2], tmp) + 1 = 2.
```

So update buf[2] and now we finish the second row:

```    c d
0 1 2   cost := min(buf[1], buf[2], tmp) + 1
a 1 1 2   tmp := buf[2]
b . . .   buf[2] := cost
```

We can repeat the same procedure above to fill the third row, which leads to the following result:

```    c d
0 1 2   buf = [2,2,2]
a 1 1 2
b 2 2 2
```

Therefore, we can conclude that the Levenshtein distance between 'ab' and 'cd' is 2.

### 2.3. Implementation in Python

Below is the straight-forward implementation of this memory-efficient algorithm in Python:

```def wagner_fischer_opt(s, t):
n = len(s)
m = len(t)
buf = list(range(m + 1))

for i in range(1, n + 1):
tmp = buf[0]
buf[0] = i

for j in range(1, m + 1):
if s[i - 1] == t[j - 1]:
tmp, buf[j] = buf[j], tmp
else:
tmp, buf[j] = buf[j], min(buf[j], buf[j - 1], tmp) + 1
return buf[m]
```

Please note that this technique does not only make the computation use less space, but also use less time. In general, this memory-optimized implementation runs 5-10% faster than the naive O(nm) space implementation. You can confirm this in the benchmark result below.

## 3. Branch Pruning

### 3.1. Overview

The fundamental observation of this section can be summarized as follows:

In general, we don't need to fill every cell in a Wagner-Fischer matrix to compute edit distance.

The rest of this section provides a theoretical base for the observation, and shows how we can apply it to actual problems.

### 3.2. Upper bound Theorem

First let me introduce a useful theorem. Suppose we have a pair of strings s and t, whose lengths are n and m respectively with n ≤ m. We can guarantee that the Levenshtein distance between them cannot exceed the length of the longer string t:

```LevenshteinDistance(s, t) ≤ m
```

Here is a proof. Let's denote "i-th character of a string x" by "xi". Using this notation, we can represent the strings s and t as follows.

```s = s1 s2 s3 ... sn-1 sn
t = t1 t2 t3 ... tm-1 tm
```

Since n ≤ m, we can rewrite the above expressions into:

```s = s1 s2 s3 ... sn-1 sn
t = t1 t2 t3 ... tn-1 tn ... tm
```

Now think about this edit operation:

• Substitute the sequence of t1 ... tn to s1 ... sn, then
• delete the trailing sequence of tn+1 ... tm.

With this operation, we can always convert t to s. And since this procedure consists of m edit steps (= n substitutions + (m - n) deletions), it cannot cost more than m. So the theorem has been proved □

Note: From another angle, this m-step edit operation can be visually described as a "diagonal" path in the WF matrix. For example, if s = "abcd" and t = "pqrs", this m-step operation takes the path below:

```    p q r s
o . . . .
a . o . . .
b . . o . .
c . . . o .
d . . . . o
```

This theorem, trivial as it may seem, is in fact quite effective for cutting the cost of the Wagner-Fischer algorithm.

### 3.3. How to Reduce the Search Space

As a starter, let's consider the cell in the upper-right corner of a WF matrix. For example, using the last example in the previous section, this cell can be shown as below:

```    p q r s
. . . . *
a . . . . .
b . . . . .
c . . . . .
d . . . . .
```

From the basic property of a WF matrix, we can immediately see that

1. there is only one edit path that steps through the cell and
2. this edit path always requires (n + m) cost (= n insertions + m deletions).

Here is how the edit path looks like:

```    p q r s
0 1 2 3 4
a . . . . 5
b . . . . 6
c . . . . 7
d . . . . 8
```

Recalling the previous section, we already have an edit path that can transform s into t with the cost m in the worst case. So it does not make sense to consider such a path that is guaranteed to cost m even in the best case scenario. This means that we can safely ignore the corner cell when filling the matrix.

```    p q r s
A B B B B   A := the "m-step" path
a . A . . B   B := the path passing through the corner cell
b . . A . B
c . . . A B        cost(A) ≤ m ≤ cost(B)
d . . . . A
```

The same argument can be applied to the other cells in the matrix as well. For example, consider the cells adjacent to the corner cell:

```    p q r s
. . . *
a . . . . *
b . . . . .
c . . . . .
d . . . . .
```

Although there are several edit paths that step through one or two of these cells, all of them share the following edit sequences:

```(n - 1) insertions + (m - 1) deletions
```

Using the argument above, it follows that there is no reason to compute these cells if m ≤ (m - 1) + (n - 1). By subtracting m from the both sides, we can remove the variable m from the inequality and reduce the condition to 2 ≤ n. In other words, unless the length of the shorter string is less than 2, we do not need to bother computing these cells.

Let's generalize this observation. Choose a cell in a WF matrix and denote the Manhattan distance from it to the cell in the upper-right corner by k. Since any path passing through the chosen cell requires, at least, (n - k) insertions + (m - k) deletions , we can ignore this cell if the condition m ≤ (n - k) + (m - k) holds. By simplifying this inequality condition, we can conclude that cells satisfying 2k ≤ n can be safely left blank.

Of course, we can trivially apply the same argument to cells in the bottom-left corner.

### 3.4. Actual Example

This section shows how to apply the observation in the previous section to an actual problem.

Back to the example where s = "abcd" and t = "pqrs". In this case, the length of the shorter string is 4 (so n=4), thus the following 12 cells satisfy the condition 2k ≤ n.

```    p q r s
. . * * *
a . . . * *
b * . . . *
c * * . . .
d * * * . .
```

So, in order to compute the edit distance, we can just focus on filling the remaining 13 cells while leaving these marked cells blank. For example, to fill the cell marked "Z" below, we only need to consider the cell to the left (marked "X") and the cell in the upper-left (marked "Y").

```    p q r s
. Y * * *
a . X Z * *     Z = min(X, Y) + 1
b * . . . *
c * * . . .
d * * * . .
```

Below shows the WF matrix after flood-filing. You can see that the edit distance between "abcd" and "pqrs" is computed correctly.

```    p q r s
0 1 * * *
a 1 1 2 * *
b * 2 2 3 *
c * * 3 3 4
d * * * 4 4
```

### 3.5. How Can We Implement This?

Although the gist of this algorithm is relatively easy to grasp, there are, in fact, a number of pitfalls when actually implementing it. if you are interested in the gory details, please look at `wagner_fischer_O2()` in the benchmark script below,

## 4. Benchmark

### 4.1. Method

For benchmarking purpose, I implemented three versions of algorithms to compute Levenshtein distance:

1. Normal Wagner-Fischer
2. Wagner-Fischer + Memory Reduction
3. Wagner-Fischer + Memory Reduction + Branch Pruning

See the benchmark script benchmark.c for details how I set up this perf test.

### 4.2. Result

The system used for this benchmark was:

• Intel Core i3-4010U (1.70GHz)
• Linux x86_64 (Debian 9.4)
• GCC 6.3.0

Here is the result:

# TIME[s] SPEED[calls/s]
Normal Wagner Fischer 6.382 657,245
+ (Memory Reduction) 5.903 710,527
+ (Memory Reduction) + (Branch Pruning) 3.993 1,050,380

The result shows that the most optimized version runs (1 - 3.993 / 6.382) ≈ 37.4% faster than the naive implementation.

There is a Python module named polyleven which implements the optimized algorithm.

## 6. License

I hereby place this document into the public domain. All my copyrights, including related and neighbouring rights, of this document and the accompanied benchmark program are abandoned.

2018 Fujimoto Seiji <fujimoto@ceptord.net>