Hi,
   I want find a duplicate records in a large file....
it contains around 22 millions records.....

basically following is my file structure....

 C1     C2     C3   C4
------------------------
12345 efghij klmno pqrs
34567 abnerv oiuuy uyrv
.......
.......

...........
............
.............

it has 22 million records..and each record have 
4 columns (C1,C2,C3 and C4)

C1  is primary key....

here i want to do some validation..
following is my validation...

1. Validate record length
2. Check if first column is NULL
3. Separate duplicate records....

How do i separate dulicate records on such a huge
file.....

duplicate means...only primary key (column)...
if column1 (C1) is duplicate, that means that row is
duplicate row and need to write into another file....

Does anybody have effeciant algorithm to find
duplicate records on a large file ....

duplicate means, not complete row duplicate..if
column1 is duplicate, that means that row is
duplicate....

I appreciate u r help
-Madhu


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to