On Tue, 11 Feb 2003 16:35:28 -0800, David wrote:

> Madhu Reddy wrote:
>
>>   I want to sort a file and  want to write the result
>> to same file....
>> I want to sort a based on 3rd column..
>> ...
>> file may have around 20 millions rows ......
>> 
> 
> if you are using the *nix os, you should try the sort utility. if you are 
> not using *nix and you don't have the sort utility, you will have to rely 
> on Perl's sort function. with 20m rows, you probably don't want to store 
> everything in memory and then sort them. what you have to do is sort the 
> data file segment by segment and then merge them back. merging is the real 
> tricky business. the following script(which i did for someone a while ago)
> will do that for you. what it does is break the file into multiple chunks of 
> 100000 lines, sort the chunks in a disk tmp file and then merge all the 
> chunks back together. when i sort the file, i keep the smallest boundary of 
> each chunk and use this number to sort the file so you don't have to 
> compare all the tmp files.

Absolute right, allthough I wouldn't write it for my own.
A merge sort is implemented as command line tool by the Perl Power Tools,
available as PPT in CPAN.


Greetings,
Janek

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to