Hi, I want find a duplicate records in a large file.... it contains around 22 millions records.....
basically following is my file structure.... C1 C2 C3 C4 ------------------------ 12345 efghij klmno pqrs 34567 abnerv oiuuy uyrv ....... ....... ........... ............ ............. it has 22 million records..and each record have 4 columns (C1,C2,C3 and C4) C1 is primary key.... here i want to do some validation.. following is my validation... 1. Validate record length 2. Check if first column is NULL 3. Separate duplicate records.... How do i separate dulicate records on such a huge file..... duplicate means...only primary key (column)... if column1 (C1) is duplicate, that means that row is duplicate row and need to write into another file.... Does anybody have effeciant algorithm to find duplicate records on a large file .... duplicate means, not complete row duplicate..if column1 is duplicate, that means that row is duplicate.... I appreciate u r help -Madhu __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]