Quickstart

Installation

You can install polyleven from PyPI:

$ pip install polyleven

Compute Levenshtein Distance

Here is the first step to use polyleven:

>>> import polyleven
>>> polyleven.levenshtein("aiueo", "abcde")
4

Polyleven can handle multi-byte characters properly:

>>> polyleven.levenshtein("文字", "漢字")
1

That’s all! Now you can start using polyleven in your project.

Quick Demo: Find similar Wikipedia articles

Let’s do something interesting with polyleven. First, download the full list of Wikipedia article titles from wikimedia.org:

$ wget http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz  # 91MB
$ gzip -d enwiki-latest-all-titles-in-ns0.gz

Now you can perform fuzzy search on Wikipedia titles:

>>> from polyleven import levenshtein
>>> titles = [t.strip() for t in open('enwiki-latest-all-titles-in-ns0')]
>>> [t for t in titles if levenshtein(x, "Mark_Twain") < 3]
['Mark_Tedin', 'Mark_Twang', 'Marc_Train', 'Mark_Trail', 'Mark_Tuan', 'Mark_Twain', 'Marc_Twain', 'Mark_Krain', 'Mark_twain', 'Mack_Swain', 'Mark_Tobin', 'Mark_Brain', 'Mark_Turin', 'Mark_Tulin', 'Mark_Tan', 'Mark_Fain', 'Dark_Train', 'Mark_Spain']

What’s next?

  • In this document, you learned how to install polyleven and compute Levenshtein distance.

  • The full source code of polyleven is available on GitHub.

  • If you have any feedback, please submit an issue.