*Hyper fast Levenshtein distance library*

Polyleven is a very efficient Levenshtein distance library for Python. Intended to be used in performance critical situations.

The source code of the latest version is available on polyleven-0.3.tar.gz (requires Python ≥ 3.4).

$ pip install polyleven

Use `levenshtein()`

to compute the edit distance between
a pair of words.

>>> from polyleven import levenshtein >>> levenshtein('hazel', 'maze') 2

I've benchmarked several Levenshtein libraries I found on PyPI, modelling a classical spell check problem.

Here is how I set up the test: I choose an English dictionary with 100k words as a test data set. I picked up a random word, then compute the Levenshtein distance with every word in the dictionary. I repeated this procedure 10 times (so 1 million computations) for each library.

The system used for this benchmark was:

- Intel Core i3-4010U (1.70GHz)
- Linux x86-64 (Debian Stretch)
- Python 3.5.3 / GCC 6.3.0
- Debian's wamerican v7.1.1

All the benchmark resources are available in the test directory.

Test target | TIME[sec] | SPEED[calls/s] ------------------------------- | ----------- | ---------------- edlib.align | 4.763 | 208216 editdistance.eval | 1.943 | 510450 jellyfish.levenshtein_distance | 0.722 | 1374081 distance.levenshtein | 0.623 | 1591396 Levenshtein.distance | 0.500 | 1982764 polyleven.levenshtein | 0.431 | 2303420

This benchmark illustrates how each Levenshtein distance library behaves as the size of input strings increases. For this test, I generated 5000 pairs of random strings for each size 16, 32, 64, 128, 256, 512, 1024, and then, measured the time each library takes to process all the inputs.

Test target | N=16 | N=32 | N=64 | N=128 | N=256 | N=512 | N=1024 ------------------------------ | ----- | ----- | ----- | ----- | ----- | ----- | ------ edlib.align | 0.040 | 0.063 | 0.094 | 0.205 | 0.432 | 0.908 | 2.089 editdistance.eval | 0.027 | 0.049 | 0.086 | 0.178 | 0.336 | 0.740 | 58.139 jellyfish.levenshtein_distance | 0.009 | 0.032 | 0.118 | 0.470 | 1.874 | 8.877 | 42.848 distance.levenshtein | 0.007 | 0.029 | 0.109 | 0.431 | 1.726 | 6.950 | 27.998 Levenshtein.distance | 0.006 | 0.022 | 0.085 | 0.336 | 1.328 | 5.286 | 21.097 polyleven.levenshtein | 0.003 | 0.005 | 0.010 | 0.043 | 0.149 | 0.550 | 2.109

Below is the list of libraries benchmarked above.

edlib v1.2.1 | https://github.com/Martinsos/edlib |

editdistance v0.4 | https://github.com/aflc/editdistance |

jellyfish v0.5.6 | https://github.com/jamesturk/jellyfish |

distance v0.1.3 | https://github.com/doukremt/distance |

Python-Levenshtein v0.12 | https://github.com/ztane/python-Levenshtein |

polyleven v0.3 | https://github.com/fujimotos/polyleven |

I hereby place my work "polyleven" into the public domain. All my copyrights, including related and neighbouring rights, of this work are abandoned.

2018 Fujimoto Seiji <fujimoto@ceptord.net>