Paul Rubin wrote: > [EMAIL PROTECTED] writes: >>> I have suggested C because if the words are all of the same length then >>> you have 30000^2 = 90 000 000 000 pairs to test.
>> Sorry, you have (n*(n-1))/2 pairs to test (~ 45 000 000 000). > Still terrible. Use a better algorithm! To put all this in perspective, here's a very similar real-world problem: You have a customer database with 300,000 records. Some of them are duplicated, because there are differences in recording the customers' names, like a one-keystroke typo. The task is to locate pairs of rows which are likely to be duplicates. 45 billion comaprisons[1] may be ecxessive. And here's a clue for a better algorithm: Knuth's TAOCP vol 3 [1973 edition] chapter 6 (searching) third page "as if the words were spelled backwards". If you find yourself reading about soundex, you've definitely gone too far :-) [1]: "comapring" in the subject: is this a game of 'Spot the deliberate mistake"? Is the OP counting a transposition of adjacent characters as "varying by one character"? Cheers -- http://mail.python.org/mailman/listinfo/python-list