> Is there a ready to use (free, best Open Source) tool able to sort lines > (each line appr. 20 bytes long) of a XXX GByte large text file (i.e. in > place) taking full advantage of available memory to speed up the process > as much as possible?
Sounds like an occasion to use a merge-sort. The pseudo-code would be: break up the file into bite-sized chunks (maybe a couple megs each). Sort each of them linewise. Write them out to intermediate files Once you have these pieces, open each file read the first line of each one. [here] Find the "earliest" of each of those lines according to your sort-order. write it to your output file read the next line from that particular file return to [here] There are some optimizations that can be had on this as well...you can find the "earliest" *and* the "next earliest" of those lines/files, and just read from the "earliest" file until the current line of it passes "next earliest"...lather, rinse, repeat shifting "next earliest" to be the "earliest" and then find the new "next earliest". I don't know if the GNU "sort" utility does this, but I've thrown some rather large files at it and haven't choked it yet. -tkc -- http://mail.python.org/mailman/listinfo/python-list