Re: Complex sort on big files

2011-08-09 Thread John Nagle
On 8/6/2011 10:53 AM, sturlamolden wrote: On Aug 1, 5:33 pm, aliman wrote: I've read the recipe at [1] and understand that the way to sort a large file is to break it into chunks, sort each chunk and write sorted chunks to disk, then use heapq.merge to combine the chunks as you read them. Or

Re: Complex sort on big files

2011-08-06 Thread sturlamolden
On Aug 1, 5:33 pm, aliman wrote: > I've read the recipe at [1] and understand that the way to sort a > large file is to break it into chunks, sort each chunk and write > sorted chunks to disk, then use heapq.merge to combine the chunks as > you read them. Or just memory map the file (mmap.mmap)

Re: Complex sort on big files

2011-08-05 Thread Steven D'Aprano
Roy Smith wrote: > Wow. > > I was going to suggest using the unix command-line sort utility via > popen() or subprocess. My arguments were that it's written in C, has 30 > years of optimizing in it, etc, etc, etc. It almost certainly has to be > faster than anything you could do in Python. > >

Re: Complex sort on big files

2011-08-05 Thread Dan Stromberg
Yup. Timsort is described as "supernatural", and I'm inclined to believe it. On Fri, Aug 5, 2011 at 7:54 PM, Roy Smith wrote: > Wow. > > Python took just about half the time. Certainly knocked my socks off. > Hard to believe, actually. > -- > http://mail.python.org/mailman/listinfo/python-li

Re: Complex sort on big files

2011-08-05 Thread Roy Smith
Wow. I was going to suggest using the unix command-line sort utility via popen() or subprocess. My arguments were that it's written in C, has 30 years of optimizing in it, etc, etc, etc. It almost certainly has to be faster than anything you could do in Python. Then I tried the experiment.

Re: Complex sort on big files

2011-08-05 Thread sturlamolden
On Aug 1, 5:33 pm, aliman wrote: > I understand that sorts are stable, so I could just repeat the whole > sort process once for each key in turn, but that would involve going > to and from disk once for each step in the sort, and I'm wondering if > there is a better way. I would consider using m

Re: Complex sort on big files

2011-08-02 Thread Dan Stromberg
On Tue, Aug 2, 2011 at 3:25 AM, Alistair Miles wrote: > Hi Dan, > > Thanks for the reply. > > On Mon, Aug 1, 2011 at 5:45 PM, Dan Stromberg wrote: > > > > Python 2.x, or Python 3.x? > > Currently Python 2.x. > So it sounds like you may want to move this code to 3.x in the future. > > What are

Re: Complex sort on big files

2011-08-02 Thread Alistair Miles
Hi Dan, Thanks for the reply. On Mon, Aug 1, 2011 at 5:45 PM, Dan Stromberg wrote: > > Python 2.x, or Python 3.x? Currently Python 2.x. > What are the types of your sort keys? Both numbers and strings. > If you're on 3.x and the key you need reversed is numeric, you can negate > the key. I

Re: Complex sort on big files

2011-08-01 Thread Peter Otten
aliman wrote: > Apologies I'm sure this has been asked many times, but I'm trying to > figure out the most efficient way to do a complex sort on very large > files. > > I've read the recipe at [1] and understand that the way to sort a > large file is to break it into chunks, sort each chunk and w

Re: Complex sort on big files

2011-08-01 Thread Dan Stromberg
Python 2.x, or Python 3.x? What are the types of your sort keys? If you're on 3.x and the key you need reversed is numeric, you can negate the key. If you're on 2.x, you can use an object with a __cmp__ method to compare objects however you require. You probably should timsort the chunks (which

Complex sort on big files

2011-08-01 Thread aliman
Hi all, Apologies I'm sure this has been asked many times, but I'm trying to figure out the most efficient way to do a complex sort on very large files. I've read the recipe at [1] and understand that the way to sort a large file is to break it into chunks, sort each chunk and write sorted chunks