Hi all, Apologies I'm sure this has been asked many times, but I'm trying to figure out the most efficient way to do a complex sort on very large files.
I've read the recipe at [1] and understand that the way to sort a large file is to break it into chunks, sort each chunk and write sorted chunks to disk, then use heapq.merge to combine the chunks as you read them. What I'm having trouble figuring out is what to do when I want to sort by one key ascending then another key descending (a "complex sort"). I understand that sorts are stable, so I could just repeat the whole sort process once for each key in turn, but that would involve going to and from disk once for each step in the sort, and I'm wondering if there is a better way. I also thought you could apply the complex sort to each chunk before writing it to disk, so each chunk was completely sorted, but then the heapq.merge wouldn't work properly, because afaik you can only give it one key. Any help much appreciated (I may well be missing something glaringly obvious). Cheers, Alistair [1] http://code.activestate.com/recipes/576755-sorting-big-files-the-python-26-way/ -- http://mail.python.org/mailman/listinfo/python-list