Charec – Handwriting Recognition with Levenshtein Distance


Fujimoto Seiji




This document has been placed in the public domain.

1. How it works

This software does the following to recognize the character you draw.

  1. Capture the mouse movement as an array of XY coordinates.

  2. Compress the gist of the array into a short string.

  3. Performs the nearest neighborhood search against the character models.

  4. Return the model with the best score.

For more details, please read the code and explanation below.

2. Download (2020-07-05, 4.8kb)
  • Add touch devices (e.g. smartphones) support. (2017-10-06, 47.9kb)
  • Migrate from TypeScript to JavaScript.

You can view the latest code on

3. Intersting, but is it actually useful?

Not much, but I think this software is very useful to illustrate how widely applicable the Levenshtein distance is.

More generally, I think there is two distinct views about the Levenshtein distance. Popular but narrow-in-scope one, and alternative but more interesting one. Each view provides a different vision of what the Levenshtein distance is about, and what it can accomplish.

3.2. Alternative View

There is nothing wrong about the view described above. Indeed, this is mostly how the Levenshtein distance is used in real world.

However, there is a more interesting (albeit less common) view of Levenshtein distance. To put it short, we can think of it as a similarity metric for everything.

That is, the Levenshtein Distance is a universal similarity metric that is applicable to a wide range of objects (e.g. images, graphs or handwritings). The catch is that you need to find some way to represent these objects as strings. If your string representation is reasonably good, you can use the Levenshtein distance to measure the similarity of these objects.

Here is how it typically goes like:

  1. Take an object X that you want to measure the similarity.

  2. Find some way to convert X into a sequence of characters.

  3. Measure the Levenshtein distance.

  4. Now you have similarity metrics for X.

This is essentially how charec works. It takes a set of strokes traced by a pointer, and compare their similarity by passing them to a Levenshtein distance function.

For encoding, I used the Schimke-Vielhauer-Dittman technique [Schimke2004] and it turned out to work surprisingly well.