[EMAIL PROTECTED] wrote: > Thanks to all who replied. It's very appreciated. > > Yes, I had to double check line counts and the number of lines is ~16 > million (instead of stated 1.6B).
OK, that's not bad at all. You have a few options: - Get enough memory to do the sort with an in-memory sort, like UNIX "sort" or Python's "sort" function. - Thrash; in-memory sorts do very badly with virtual memory, but eventually they finish. Might take many hours. - Get a serious disk-to-disk sort program. (See "http://www.ordinal.com/". There's a free 30-day trial. It can probably sort your data in about a minute.) - Load the data into a database like MySQL and let it do the work. This is slow if done wrong, but OK if done right. - Write a distribution sort yourself. Fan out the incoming file into one file for each first letter, sort each subfile, merge the results. With DRAM at $64 for 4GB, I'd suggest just getting more memory and using a standard in-memory sort. John Nagle -- http://mail.python.org/mailman/listinfo/python-list