I have to find duplicate customers in are customer file (around 60,000 customers). The file has been exported into a pipe delimited file.
CustCode|Ship2Code|Name|Addr1|Addr2|City|State|ZipCode|Phone|Fax|Country Normally this task is done by printing it and someone going through it manually to find them. The problem is the duplicates can be misspelled meaning you can't just do an exact search. My thinking was a couple of passes. Phone Numbers, Addresses, then address digits & City. The first two will give me pretty secure matches and the third will give some possibilities. I would like the script to process the file and then dump out the lines to another file. I cant figure out how to layout the script or what data structures to use. I guess I would almost have to set it up like an old school bubble sort routine. Any suggestions are ideas would be greatly appreciated. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]