This article demonstrates how you can use Levenshtein distance to recognize handwriting characters.
Draw a single digit number from 1 to 9 on the canvas. To clear the canvas, please click "Reset Canvas".
This software does the following to recognize the character you draw.
For more details, please read the code and explanation below.
charec-0.3.zip (2020-07-05, 4.8kb)
charec-0.2.zip (2017-10-06, 47.9kb)
You can view the latest code on cgit (also on GitHub)
Not much, but I think this software is very useful to illustrate how widely applicable the Levenshtein distance is.
More generally, I think there is two distinct views about the Levenshtein distance. Popular but narrow-in-scope one, and alternative but more interesting one. Each view provides a different vision of what the Levenshtein distance is about, and what it can accomplish.
The popular view is that the Levenshtein distance is a similarity metric for strings. If you input two strings, it will return a number that shows how alike they are -- the smaller, the more similar. This metric is, people say, useful for linguistic tasks.
For instance, suppose you're trying to find the correct spelling of a mistyped word. You feel the spelling is not correct, but not far off. In theory, you can filp through an English dictionary to find out the correct spelling, but it is rather tiresome to do so.
So instead, you teach computers to iterate over the dictionary. Since computers do not have an innate understanding of "similarity", you tell computers to use the Levenshtein distance and bring back the word with the best score.
The basic point here is that Levenshtein distance works as a rough approximation of the human perception of string similarity, in particular, in such a form that computers can understand.
There is nothing wrong about the view described above. Indeed, this is mostly how the Levenshtein distance is used in real world.
However, there is a more interesting (albeit less common) view of Levenshtein distance. To put it short, we can think of it as a similarity metric for everything.
That is, the Levenshtein Distance is a universal similarity metric that is applicable to a wide range of objects (e.g. images, graphs or handwritings). The catch is that you need to find some way to represent these objects as strings. If your string representation is reasonably good, you can use the Levenshtein distance to measure the similarity of these objects.
Here is how it typically goes like:
This is essentially how charec works. It takes a set of strokes traced by a pointer, and compare their similarity by passing them to a Levenshtein distance function.
For encoding, I used the Schimke-Vielhauer-Dittman technique and it turned out to work surprisingly well.
I hereby place my work "charec" into the public domain. All my copyrights, including related and neighbouring rights, of this work are abandoned.
2019 Fujimoto Seiji <email@example.com>