On 11/3/2016 1:49 PM, jlada...@itu.edu wrote:
The Levenshtein distance is a very precise definition of dissimilarity between 
sequences.  It specifies the minimum number of single-element edits you would 
need to change one sequence into another.  You are right that it is fairly 
expensive to compute.

But you asked for an algorithm that would determine whether groups of strings are 
"sort of similar".  How imprecise can you be?  An analysis of the frequency of 
each individual character in a string might be good enough for you.

I also once used a Levenshtein distance algo in Python (code snippet D0DE4716-B6E6-4161-9219-2903BF8F547F) to compare names of students (it worked, but turned out to not be what I needed), but you may also be able to use some items "off the shelf" from Python's difflib.

--
Neil Cerutti

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to