You can install polyleven from PyPI:

$ pip install polyleven

Compute Levenshtein Distance

Here is the first step to use polyleven:

>>> import polyleven
>>> polyleven.levenshtein("aiueo", "abcde")

Polyleven can handle multi-byte characters properly:

>>> polyleven.levenshtein("文字", "漢字")

That’s all! Now you can start using polyleven in your project.

Perform fuzzy search on Wikipedia

Here is a quick demo that you can test at home: Implement a fast fuzzy title search on Wikipedia articles.

First, download the title list from Wikiipedia (90MB with Gzip compression):

$ wget http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz
$ gzip -d enwiki-latest-all-titles-in-ns0.gz

Now you can perform fuzzy search on Wikipedia titles:

>>> from polyleven import levenshtein
>>> titles = [t.strip() for t in open('enwiki-latest-all-titles-in-ns0')]
>>> [t for t in titles if levenshtein(x, "Mark_Twain") < 3]
['Mark_Tedin', 'Mark_Twang', 'Marc_Train', 'Mark_Trail', 'Mark_Tuan', 'Mark_Twain', 'Marc_Twain', 'Mark_Krain', 'Mark_twain', 'Mack_Swain', 'Mark_Tobin', 'Mark_Brain', 'Mark_Turin', 'Mark_Tulin', 'Mark_Tan', 'Mark_Fain', 'Dark_Train', 'Mark_Spain']

Easy, huf?

What’s next?

  • In this document, you learned how to install polyleven and compute Levenshtein distance.

  • The full source code of polyleven is available on GitHub.

  • If you have any feedback, please submit an issue.