Wagner-Fischer vs Myers’ Algorithm

Author:

Fujimoto Seiji

Published:

2020-08-15

Updated:

2022-03-12

Copyright:

This document has been placed in the public domain.

1. Introduction

This article compares the two most common Levenshtein distance algorithms.

While algorithm selection is corner-stone for program efficiency, the empirical evidence on which algorithm performs better (and when) is often limited.

To fill the gap, I did a comparative survey by implementing both algorithms.

2. Methods

2.1. Implementation

I created a C implementation of two algorithms. Some notes on the implementation:

  • The Wagner-Fischer algorithm was optimized using the method described in this article. To my knowledge, this is the best generic implementation of this algorithm.

  • The Myers algorithm was optimized using a method to compact the pre-computation from \(O(Σ)\) to \(O(n)\) (\(Σ\) is a number of alphabets and \(n\) is the length of input strings). See this commit for details.

You can download the implementation from benchmark.c.

2.2. Test Data

Test data is random hex strings ranging from 2 characters to 64 characters.

The two algorithms were measured how much time they take to process 1,000,000 string pairs of each length.

3. Result

The following figure show the result on Intel Xeon(R) E5-2660 (2.60Ghz). The program was compiled using GCC 8.3.0.

As you can see, the Myers algorithm generally performs better than Wagner-Fischer algorithm.

_images/benchmark.svg

Comparison of Wagner-Fischer vs Myers Algorithm

The figure 2 below shows the same result using a log scale:

_images/benchmark-log.svg

Comparison of Wagner-Fischer vs Myers Algorithm (logscale)

The “break-even” point seems to be six characters. Longer than that, Myers’s algorithm works better than the Wagner-Fischer algorithm. Shorter than that, the converse holds true.