manstey wrote: > Hi, > > Is there a clever way to see if two strings of the same length vary by > only one character, and what the character is in both strings. > > E.g. str1=yaqtil str2=yaqtel > > they differ at str1[4] and the difference is ('i','e') > > But if there was str1=yiqtol and str2=yaqtel, I am not interested. > > can anyone suggest a simple way to do this? > > My next problem is, I have a list of 300,000+ words and I want to find > every pair of such strings. I thought I would first sort on length of > string, but how do I iterate through the following: > > str1 > str2 > str3 > str4 > str5 > > so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5, > str2 & str3, str3 & str4, str3 & str5, str4 & str5.
If your strings are pretty short you can do it like this even without sorting by length first: def fuzzy_keys(s): for pos in range(len(s)): yield s[0:pos]+chr(0)+s[pos+1:] def fuzzy_insert(d, s): for fuzzy_key in fuzzy_keys(s): if fuzzy_key in d: strings = d[fuzzy_key] if type(strings) is list: strings += s else: d[fuzzy_key] = [strings, s] else: d[fuzzy_key] = s def gather_fuzzy_matches(d): for strings in d.itervalues(): if type(strings) is list: yield strings acc = {} fuzzy_insert(acc, "yaqtel") fuzzy_insert(acc, "yaqtil") fuzzy_insert(acc, "oaqtil") print list(gather_fuzzy_matches(acc)) prints [['yaqtil', 'oaqtil'], ['yaqtel', 'yaqtil']] -- http://mail.python.org/mailman/listinfo/python-list