What is Levenshtein Distance?¶
Levenshtein distance is a string metric for measuring the difference between two sequences.
Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.
Who uses it?¶
Programmers use Levenshtein distance for simple spell correction. For example, Git uses it to suggest the most similar command.
$ git checkou git: 'checkou' is not a git command. See 'git --help'. The most similar command is checkout
Biologists use Levenshtein distance to study mutation of DNA/RNA sequences. (…)
Linguists use Levenshtein distance to classify languages into a tree. See [Serva2008] for example.
Other usages include hand-writing recognition. You can check my toy implementation Charec.
How to compute it?¶
There is two general algorithms for Levenshtein distance:
Read Wagner-Fischer vs Myers’ Algorithm to learn which suites for your use case.
Note that Myers algorithm cannot be applied to variants of Leveshtein distance (e.g. edit costs are not fixed to 1).