Re: newb: comapring two strings

John Machin Fri, 19 May 2006 14:18:04 -0700

Paul Rubin wrote:
> [EMAIL PROTECTED] writes:
>>> I have suggested C because if the words are all of the same length then
>>> you have 30000^2 = 90 000 000 000 pairs to test.


>> Sorry, you have (n*(n-1))/2  pairs to test (~ 45 000 000 000).

> Still terrible.  Use a better algorithm!

To put all this in perspective, here's a very similar real-world
problem:

You have a customer database with 300,000 records. Some of them are
duplicated, because there are differences in recording the customers'
names, like a one-keystroke typo. The task is to locate pairs of rows
which are likely to be duplicates. 45 billion comaprisons[1] may be
ecxessive.

And here's a clue for a better algorithm: Knuth's TAOCP vol 3 [1973
edition] chapter 6 (searching) third page "as if the words were spelled
backwards". If you find yourself reading about soundex, you've
definitely gone too far :-)

[1]: "comapring" in the subject: is this a game of 'Spot the deliberate
mistake"? Is the OP counting a transposition of adjacent characters as
"varying by one character"?

Cheers

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: newb: comapring two strings

Reply via email to