Finding Duplicates.

Paul Kraus Tue, 11 Feb 2003 07:10:29 -0800

I have to find duplicate customers in are customer file (around 60,000
customers).
The file has been exported into a pipe delimited file.


CustCode|Ship2Code|Name|Addr1|Addr2|City|State|ZipCode|Phone|Fax|Country

Normally this task is done by printing it and someone going through it
manually to find them.

The problem is the duplicates can be misspelled meaning you can't just
do an exact search.
My thinking was a couple of passes. Phone Numbers, Addresses, then
address digits & City.

The first two will give me pretty secure matches and the third will give
some possibilities.

I would like the script to process the file and then dump out the lines
to another file.
I cant figure out how to layout the script or what data structures to
use. I guess I would almost have to set it up like an old school bubble
sort routine.

Any suggestions are ideas would be greatly appreciated.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Finding Duplicates.

Reply via email to