On 8/6/2011 10:53 AM, sturlamolden wrote:
On Aug 1, 5:33 pm, aliman wrote:
I've read the recipe at [1] and understand that the way to sort a
large file is to break it into chunks, sort each chunk and write
sorted chunks to disk, then use heapq.merge to combine the chunks as
you read them.
Or
On Aug 1, 5:33 pm, aliman wrote:
> I've read the recipe at [1] and understand that the way to sort a
> large file is to break it into chunks, sort each chunk and write
> sorted chunks to disk, then use heapq.merge to combine the chunks as
> you read them.
Or just memory map the file (mmap.mmap)
Roy Smith wrote:
> Wow.
>
> I was going to suggest using the unix command-line sort utility via
> popen() or subprocess. My arguments were that it's written in C, has 30
> years of optimizing in it, etc, etc, etc. It almost certainly has to be
> faster than anything you could do in Python.
>
>
Yup. Timsort is described as "supernatural", and I'm inclined to believe
it.
On Fri, Aug 5, 2011 at 7:54 PM, Roy Smith wrote:
> Wow.
>
> Python took just about half the time. Certainly knocked my socks off.
> Hard to believe, actually.
> --
> http://mail.python.org/mailman/listinfo/python-li
Wow.
I was going to suggest using the unix command-line sort utility via
popen() or subprocess. My arguments were that it's written in C, has 30
years of optimizing in it, etc, etc, etc. It almost certainly has to be
faster than anything you could do in Python.
Then I tried the experiment.
On Aug 1, 5:33 pm, aliman wrote:
> I understand that sorts are stable, so I could just repeat the whole
> sort process once for each key in turn, but that would involve going
> to and from disk once for each step in the sort, and I'm wondering if
> there is a better way.
I would consider using m
On Tue, Aug 2, 2011 at 3:25 AM, Alistair Miles wrote:
> Hi Dan,
>
> Thanks for the reply.
>
> On Mon, Aug 1, 2011 at 5:45 PM, Dan Stromberg wrote:
> >
> > Python 2.x, or Python 3.x?
>
> Currently Python 2.x.
>
So it sounds like you may want to move this code to 3.x in the future.
> > What are
Hi Dan,
Thanks for the reply.
On Mon, Aug 1, 2011 at 5:45 PM, Dan Stromberg wrote:
>
> Python 2.x, or Python 3.x?
Currently Python 2.x.
> What are the types of your sort keys?
Both numbers and strings.
> If you're on 3.x and the key you need reversed is numeric, you can negate
> the key.
I
aliman wrote:
> Apologies I'm sure this has been asked many times, but I'm trying to
> figure out the most efficient way to do a complex sort on very large
> files.
>
> I've read the recipe at [1] and understand that the way to sort a
> large file is to break it into chunks, sort each chunk and w
Python 2.x, or Python 3.x?
What are the types of your sort keys?
If you're on 3.x and the key you need reversed is numeric, you can negate
the key.
If you're on 2.x, you can use an object with a __cmp__ method to compare
objects however you require.
You probably should timsort the chunks (which
Hi all,
Apologies I'm sure this has been asked many times, but I'm trying to
figure out the most efficient way to do a complex sort on very large
files.
I've read the recipe at [1] and understand that the way to sort a
large file is to break it into chunks, sort each chunk and write
sorted chunks
11 matches
Mail list logo