Re: need some kind of "coherence index" for a group of strings

Neil D. Cerutti Thu, 03 Nov 2016 13:12:40 -0700

On 11/3/2016 1:49 PM, jlada...@itu.edu wrote:

The Levenshtein distance is a very precise definition of dissimilarity between 
sequences.  It specifies the minimum number of single-element edits you would 
need to change one sequence into another.  You are right that it is fairly 
expensive to compute.


But you asked for an algorithm that would determine whether groups of strings are 
"sort of similar".  How imprecise can you be?  An analysis of the frequency of 
each individual character in a string might be good enough for you.

I also once used a Levenshtein distance algo in Python (code snippetD0DE4716-B6E6-4161-9219-2903BF8F547F) to compare names of students (itworked, but turned out to not be what I needed), but you may also beable to use some items "off the shelf" from Python's difflib.


--
Neil Cerutti

--
https://mail.python.org/mailman/listinfo/python-list

Re: need some kind of "coherence index" for a group of strings

Reply via email to