Madhu Reddy wrote: > Hi, > I want to sort a file and want to write the result > to same file.... > I want to sort a based on 3rd column.. > > following is my file format > > C1 C2 C3 C4 > 1234 guhr 89890 uierfer > 1324 guii 60977 hiofver > 5467 frwf 56576 errtttt > > > i want to sort above file based on column 3(C3) > and i want to write sorted result to same file.... > > After sorting my file should be > > 5467 frwf 56576 errtttt > 1324 guii 60977 hiofver > 1234 guhr 89890 uierfer > > > > how to do this ? > file may have around 20 millions rows ...... >
if you are using the *nix os, you should try the sort utility. if you are not using *nix and you don't have the sort utility, you will have to rely on Perl's sort function. with 20m rows, you probably don't want to store everything in memory and then sort them. what you have to do is sort the data file segment by segment and then merge them back. merging is the real tricky business. the following script(which i did for someone a while ago) will do that for you. what it does is break the file into multiple chunks of 100000 lines, sort the chunks in a disk tmp file and then merge all the chunks back together. when i sort the file, i keep the smallest boundary of each chunk and use this number to sort the file so you don't have to compare all the tmp files. #!/usr/bin/perl -w use strict; my @buffer = (); my @tmps = (); my %bounds = (); my $counter = 0; open(FILE,"file.txt") || die $!; while(<FILE>){ push(@buffer,$_); if(@buffer > 100000){ my $tmp = "tmp" . $counter++ . ".txt"; push(@tmps,$tmp); sort_it(\@buffer,$tmp); @buffer = (); } } close(FILE); merge_it(\%bounds); unlink(@tmps); #-- DONE --# sub sort_it{ my $ref = shift; my $tmp = shift; my $first = 1; open(TMP,">$tmp") || die $!; for(sort {my @fields1 = split(/\s/,$a); my @fields2 = split(/\s/,$b); $fields1[2] <=> $fields2[2] } @{$ref}){ if($first){ $bounds{$tmp} = (split(/\s/))[2]; $first = 0; } print TMP $_; } close(TMP); } sub merge_it{ my $ref = shift; my @files = sort {$ref->{$a} <=> $ref->{$b}} keys %{$ref}; my $merged_to = $files[0]; for(my $i=1; $i<@files; $i++){ open(FIRST,$merged_to) || dir $!; open(SECOND,$files[$i]) || dir $!; my $merged_tmp = "merged_tmp$i.txt"; open(MERGED,">$merged_tmp") || die $!; my $line1 = <FIRST>; my $line2 = <SECOND>; while(1){ if(!defined($line1) && defined($line2)){ print MERGED $line2; print MERGED while(<SECOND>); last; } if(!defined($line2) && defined($line1)){ print MERGED $line1; print MERGED while(<FIRST>); last; } last if(!defined($line1) && !defined($line2)); my $value1 = (split(/\s/,$line1))[2]; my $value2 = (split(/\s/,$line2))[2]; if($value1 == $value2){ print MERGED $line1; print MERGED $line2; $line1 = <FIRST>; $line2 = <SECOND>; }elsif($value1 > $value2){ while($value1 > $value2){ print MERGED $line2; $line2 = <SECOND>; last unless(defined $line2); $value2 = (split(/\s/,$line2))[2]; } }else{ while($value1 < $value2){ print MERGED $line1; $line1 = <FIRST>; last unless(defined $line1); $value1 = (split(/\s/,$line1))[2]; } } } close(FIRST); close(SECOND); close(MERGED); $merged_to = $merged_tmp; } } __END__ after the script finish, you wil notice some files named merged_tmp<number>.txt. if you look at the merged_tmp<largest number>.txt, you should see your original files are sorted in this file. i decided not to delete those merged_tmp files so you can see exactly how each chunk is sorted one by one. great for debug. i omitted a lot of error checks which you should add if you decided to use the script. it can sort extrememly large file without using a lot of memory but it does use up your disk space and it isn't very fast. finally, if you found the script not working, please let me know so i can fix it. david -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]