Hi there, apologies for the generic question. Here is my problem let's say that I have a list of lists of strings.

list1:    #strings are sort of similar to one another

  my_nice_string_blabla
  my_nice_string_blqbli
  my_nice_string_bl0bla
  my_nice_string_aru


list2:    #strings are mostly different from one another

  my_nice_string_blabla
  some_other_string
  yet_another_unrelated string
  wow_totally_different_from_others_too


I would like an algorithm that can look at the strings and determine that strings in list1 are sort of similar to one another, while the strings in list2 are all different. Ideally, it would be nice to have some kind of 'coherence index' that I can exploit to separate lists given a certain threshold.

I was about to concoct something using levensthein distance, but then I figured that it would be expensive to compute and I may be reinventing the wheel.

Thanks in advance to python masters that may have suggestions...



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to