On Thu, Nov 3, 2016 at 9:18 AM, Fillmore <fillmore_rem...@hotmail.com> wrote:
> > Hi there, apologies for the generic question. Here is my problem let's say > that I have a list of lists of strings. > > list1: #strings are sort of similar to one another > > my_nice_string_blabla > my_nice_string_blqbli > my_nice_string_bl0bla > my_nice_string_aru > > > list2: #strings are mostly different from one another > > my_nice_string_blabla > some_other_string > yet_another_unrelated string > wow_totally_different_from_others_too > > > I would like an algorithm that can look at the strings and determine that > strings in list1 are sort of similar to one another, while the strings in > list2 are all different. > Ideally, it would be nice to have some kind of 'coherence index' that I > can exploit to separate lists given a certain threshold. > > I was about to concoct something using levensthein distance, but then I > figured that it would be expensive to compute and I may be reinventing the > wheel. > > Thanks in advance to python masters that may have suggestions... > > > > -- > https://mail.python.org/mailman/listinfo/python-list > When you say similar, do you mean similar in the amount of duplicate words/letters? Or were you more interested in similar sentence structure? -- https://mail.python.org/mailman/listinfo/python-list