Madhu Reddy wrote: > C1 C2 C3 C4 > ------------------------ > 12345 efghij klmno pqrs > 34567 abnerv oiuuy uyrv > 94567 abnerv gtuuy hyrv > 12345 aswrfr rtyyt erer > 94567 abnerv gtuuy hyrv > > > Here row1 and row4 are duplicates...those needs > to be removed or moved to another file >
db is the way to go but if you need to clean the data before letting it enter the db, you could do a little trick: #!/usr/bin/perl -w use strict; my $pre = undef; my $data = "/tmp/data.file"; system("sort -n -k 1,1 -o $data $data") && die $?; open(DATA,$data) || die $!; while(<DATA>){ /^(\d+)/; print if(!defined($pre) || $pre != $1); $pre = $1; } close(DATA); __END__ this should remove the dups and clean the data for you. the scripts assumes that you are running *nix and has the sort utility and assume that for all dups, the first entry is used and the rest is discarded. i decided not to use Perl's sort function mainly because for 22m rows, it could be slow. it also assume the columns in your data file is separate by a single space. david -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]