FWIW, the algorithms in early editions [haven't looked at recent ones]
are designed for magnetic tapes, not disk. They do still work on disk
(treat each tape drive as a file on disc). I had to implement a robust
production-quality sort on MS-DOS about 20 years ago, and did it
straight out of Knuth'
[Paul]
> Thanks! I definitely didn't want to go into any elaborate programming
> for this, and the Unix sort is perfect for this. It sorted a tenth of
> my data in about 8 min, which is entirely satisfactory to me (assuming
> it will take ~ 20 times more to do the whole thing). Your answer
> gre
Thanks! I definitely didn't want to go into any elaborate programming
for this, and the Unix sort is perfect for this.
It sorted a tenth of my data in about 8 min, which is entirely
satisfactory to me (assuming it will take ~ 20 times more to do the
whole thing).
Your answer greatly helped!
Paul
-
"Paul" <[EMAIL PROTECTED]> writes:
> If you really want to know, my entries are elliptic curves and my
> hashing function is an attempt at mapping them to their Serre resdual
> representation modulo a given prime p.
>
> Now, for you to tell me something relevant about the data that I don't
> alrea
The reason I am not telling you much about the data is not because I am
afraid anyone would steal my ideas, or because I have a non-disclosure
agreement or that I don't want to end up pumping gas.
It is just that it is pretty freaking damn hard to even explain what is
going on. Probably a bit harde
On Tue, 2004-12-07 at 16:47, Paul wrote:
> I really do need to sort. It is complicated and I haven't said why, but
> it will help in finding similar keys later on. Sorry I can't be more
> precise, this has to do with my research.
Precision is precisely what we require to give you an answer more
me
On Tue, 07 Dec 2004 12:27:33 -0800, Paul wrote:
> I have a large database of 15GB, consisting of 10^8 entries of
> approximately 100 bytes each. I devised a relatively simple key map on
> my database, and I would like to order the database with respect to the
> key.
You won't be able to load this
Paul,
I can pretty much promise you that it you really have 10^8
records they should be put into a database and let the database
do the sorting by creating indexes on the fields that you want.
Something like MySQL should do nicely and is free.
http://www.mysql.org
Python has good interface to mysql
Paul wrote:
I have a large database of 15GB, consisting of 10^8 entries of
approximately 100 bytes each.
or 10 gigabytes of data.
A few thoughts on this:
- Space is not going to be an issue. I have a Tb available.
I presume this is disk space, not memory. If you do have a Tb of RAM
and you are us
Paul wrote:
Is this reasonnable to do on 10^8 elements with repeats in the keys? I
guess I should just try and see for myself.
Yeah, that's usually the right solution. I didn't comment on
space/speed issues because they're so data dependent in a situation like
this, and without actually looking
I really do need to sort. It is complicated and I haven't said why, but
it will help in finding similar keys later on. Sorry I can't be more
precise, this has to do with my research.
Your two other suggestions with itertools and operator are more useful,
but I was mostly wondering about performanc
I really do need to sort. It is complicated and I haven't said why, but
it will help in finding similar keys later on. Sorry I can't be more
precise, this has to do with my research.
Your two other suggestions with itertools and operator are more useful,
but I was mostly wondering about performanc
Paul wrote:
I expect a few repeats for most of the keys, and that s actually part
of what I want to figure out in the end. (Said loosely, I want to group
all the data entries having "similar" keys. For this I need to sort the
keys first (data entries having _same_ key), and then figure out which
ke
Hi all
I have a sorting problem, but my experience with Python is rather
limited (3 days), so I am running this by the list first.
I have a large database of 15GB, consisting of 10^8 entries of
approximately 100 bytes each. I devised a relatively simple key map on
my database, and I would like to
14 matches
Mail list logo