Hi all I have a sorting problem, but my experience with Python is rather limited (3 days), so I am running this by the list first.
I have a large database of 15GB, consisting of 10^8 entries of approximately 100 bytes each. I devised a relatively simple key map on my database, and I would like to order the database with respect to the key. I expect a few repeats for most of the keys, and that s actually part of what I want to figure out in the end. (Said loosely, I want to group all the data entries having "similar" keys. For this I need to sort the keys first (data entries having _same_ key), and then figure out which keys are "similar"). A few thoughts on this: - Space is not going to be an issue. I have a Tb available. - The Python sort() on list should be good enough, if I can load the whole database into a list/dict - each data entry is relatively small, so I shouldn't use pointers - Keys could be strings, integers with the usual order, whatever is handy, it doesn't matter to me. The choice will probably have to do with what sort() prefers. - Also I will be happy with any key space size. So I guess 100*size of the database will do. Any comments? How long should I hope this sort will take? It will sound weird, but I actually have 12 different key maps and I want to sort this with respect to each map, so I will have to sort 12 times. Paul -- http://mail.python.org/mailman/listinfo/python-list