Hi there, apologies for the generic question. Here is my problem let's
say that I have a list of lists of strings.
list1: #strings are sort of similar to one another
my_nice_string_blabla
my_nice_string_blqbli
my_nice_string_bl0bla
my_nice_string_aru
list2: #strings are mostly different from one another
my_nice_string_blabla
some_other_string
yet_another_unrelated string
wow_totally_different_from_others_too
I would like an algorithm that can look at the strings and determine
that strings in list1 are sort of similar to one another, while the
strings in list2 are all different.
Ideally, it would be nice to have some kind of 'coherence index' that I
can exploit to separate lists given a certain threshold.
I was about to concoct something using levensthein distance, but then I
figured that it would be expensive to compute and I may be reinventing
the wheel.
Thanks in advance to python masters that may have suggestions...
--
https://mail.python.org/mailman/listinfo/python-list