Re: Sorting in huge files

2004-12-09 Thread sjmachin
FWIW, the algorithms in early editions [haven't looked at recent ones] are designed for magnetic tapes, not disk. They do still work on disk (treat each tape drive as a file on disc). I had to implement a robust production-quality sort on MS-DOS about 20 years ago, and did it straight out of Knuth'

Re: Sorting in huge files

2004-12-09 Thread François Pinard
[Paul] > Thanks! I definitely didn't want to go into any elaborate programming > for this, and the Unix sort is perfect for this. It sorted a tenth of > my data in about 8 min, which is entirely satisfactory to me (assuming > it will take ~ 20 times more to do the whole thing). Your answer > gre

Re: Sorting in huge files

2004-12-09 Thread Paul
Thanks! I definitely didn't want to go into any elaborate programming for this, and the Unix sort is perfect for this. It sorted a tenth of my data in about 8 min, which is entirely satisfactory to me (assuming it will take ~ 20 times more to do the whole thing). Your answer greatly helped! Paul -

Re: Sorting in huge files

2004-12-09 Thread Paul Rubin
"Paul" <[EMAIL PROTECTED]> writes: > If you really want to know, my entries are elliptic curves and my > hashing function is an attempt at mapping them to their Serre resdual > representation modulo a given prime p. > > Now, for you to tell me something relevant about the data that I don't > alrea

Re: Sorting in huge files

2004-12-09 Thread Paul
The reason I am not telling you much about the data is not because I am afraid anyone would steal my ideas, or because I have a non-disclosure agreement or that I don't want to end up pumping gas. It is just that it is pretty freaking damn hard to even explain what is going on. Probably a bit harde

Re: Sorting in huge files

2004-12-08 Thread Adam DePrince
On Tue, 2004-12-07 at 16:47, Paul wrote: > I really do need to sort. It is complicated and I haven't said why, but > it will help in finding similar keys later on. Sorry I can't be more > precise, this has to do with my research. Precision is precisely what we require to give you an answer more me

Re: Sorting in huge files

2004-12-08 Thread Jeremy Sanders
On Tue, 07 Dec 2004 12:27:33 -0800, Paul wrote: > I have a large database of 15GB, consisting of 10^8 entries of > approximately 100 bytes each. I devised a relatively simple key map on > my database, and I would like to order the database with respect to the > key. You won't be able to load this

Re: Sorting in huge files

2004-12-07 Thread Larry Bates
Paul, I can pretty much promise you that it you really have 10^8 records they should be put into a database and let the database do the sorting by creating indexes on the fields that you want. Something like MySQL should do nicely and is free. http://www.mysql.org Python has good interface to mysql

Re: Sorting in huge files

2004-12-07 Thread Scott David Daniels
Paul wrote: I have a large database of 15GB, consisting of 10^8 entries of approximately 100 bytes each. or 10 gigabytes of data. A few thoughts on this: - Space is not going to be an issue. I have a Tb available. I presume this is disk space, not memory. If you do have a Tb of RAM and you are us

Re: Sorting in huge files

2004-12-07 Thread Steven Bethard
Paul wrote: Is this reasonnable to do on 10^8 elements with repeats in the keys? I guess I should just try and see for myself. Yeah, that's usually the right solution. I didn't comment on space/speed issues because they're so data dependent in a situation like this, and without actually looking

Re: Sorting in huge files

2004-12-07 Thread Paul
I really do need to sort. It is complicated and I haven't said why, but it will help in finding similar keys later on. Sorry I can't be more precise, this has to do with my research. Your two other suggestions with itertools and operator are more useful, but I was mostly wondering about performanc

Re: Sorting in huge files

2004-12-07 Thread Paul
I really do need to sort. It is complicated and I haven't said why, but it will help in finding similar keys later on. Sorry I can't be more precise, this has to do with my research. Your two other suggestions with itertools and operator are more useful, but I was mostly wondering about performanc

Re: Sorting in huge files

2004-12-07 Thread Steven Bethard
Paul wrote: I expect a few repeats for most of the keys, and that s actually part of what I want to figure out in the end. (Said loosely, I want to group all the data entries having "similar" keys. For this I need to sort the keys first (data entries having _same_ key), and then figure out which ke

Sorting in huge files

2004-12-07 Thread Paul
Hi all I have a sorting problem, but my experience with Python is rather limited (3 days), so I am running this by the list first. I have a large database of 15GB, consisting of 10^8 entries of approximately 100 bytes each. I devised a relatively simple key map on my database, and I would like to