Paul Rubin>Still terrible. Use a better algorithm!< I agree, it's O(n^2), but if you need to run this program just 1 time, and the program is in C, and you don't want to use much time to think and code a better algorithm (or you aren't able to do it) then maybe that naive solution can be enough, (if the running time is around 30-90 minutes). Total time to solve a one-shot problem is often the sum of thinking time + programming time + running time :-)
Paul Rubin>Use this to generate all the variants of all the words in the dictionary, and write those out into a file, each line containing a variant plus the original word. Then use a sorting utility like the unix "sort" program to sort the file. Those programs work efficiently even if the file is too large to fit in memory. Then read the sorted file to find equivalence classes on the variants.< If the words are all 5 character long, then I think you are creating: (len(lowercase)-1) * 5 * 30000 = 3750000 strings of len 5+5 (plus enter if you save it into a file), this means a file of about 37 MB. All those trings may even fit in memory if you have lot of it. Your solution looks nice :-) (It reminds me a little the word signature trick used to find anagrams). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list