I don't know much about these topics but, wouldn't soundex do the job?? On Thursday, November 3, 2016 at 12:18:19 PM UTC-4, Fillmore wrote: > Hi there, apologies for the generic question. Here is my problem let's > say that I have a list of lists of strings. > > list1: #strings are sort of similar to one another > > my_nice_string_blabla > my_nice_string_blqbli > my_nice_string_bl0bla > my_nice_string_aru > > > list2: #strings are mostly different from one another > > my_nice_string_blabla > some_other_string > yet_another_unrelated string > wow_totally_different_from_others_too > > > I would like an algorithm that can look at the strings and determine > that strings in list1 are sort of similar to one another, while the > strings in list2 are all different. > Ideally, it would be nice to have some kind of 'coherence index' that I > can exploit to separate lists given a certain threshold. > > I was about to concoct something using levensthein distance, but then I > figured that it would be expensive to compute and I may be reinventing > the wheel. > > Thanks in advance to python masters that may have suggestions...
-- https://mail.python.org/mailman/listinfo/python-list