"Paul McGuire" <[EMAIL PROTECTED]> writes: > >>> def offByNoMoreThanOneCharacter(a,b): > ... return len(a)==len(b) and sum(map(lambda (x,y): x==y, zip(a,b))) >= > len(a)-1
Yikes! How about (untested): def offByNoMoreThanOneCharacter(a,b): return len(a)==len(b) and \ len([i for i in xrange(len(a)) if a[i] != b[i]]) <= 1 which seems more direct. I'm not sure it's correct since I'd consider "apple" and "apples" to be one character apart even though their lengths are unequal. But I think this type of function is useless for the problem at hand, since there are around N=300,000 words, comparing every pair of them is O(N**2) which is a bit painful. I'd use a decorate-sort-undecorate approach like: # generate all the ways to def generate_all_variants(word): wlower = word.lower() yield (wlower, word) for i in xrange(len(word)): yield (wlower[:i] + wlower[i+1:], word) Use this to generate all the variants of all the words in the dictionary, and write those out into a file, each line containing a variant plus the original word. Then use a sorting utility like the unix "sort" program to sort the file. Those programs work efficiently even if the file is too large to fit in memory. Then read the sorted file to find equivalence classes on the variants. -- http://mail.python.org/mailman/listinfo/python-list