On Tue, 2004-12-07 at 16:47, Paul wrote: > I really do need to sort. It is complicated and I haven't said why, but > it will help in finding similar keys later on. Sorry I can't be more > precise, this has to do with my research.
Precision is precisely what we require to give you an answer more meaningful than "write a script to load it into your favorite database and type 'select * from table order by column;' " Now unless you have an NDA with an employer or are working on something classified, (in which case you have already given us too much information and should start looking for another job and lawyer) I would venture a guess that you have more to gain than lose from giving us more information. Decisions are hard sometimes ... is the help worth the risk that somebody in this forum will look at your question, say "hey that is a neat idea," duplicate all of your research and publish before you shaming you to a life of asking "do you want fries with that" and pumping gas. > > Your two other suggestions with itertools and operator are more useful, > but I was mostly wondering about performance issue. What performance issue? Nowadays any decent laptop should be able to handle this dataset (from disk) without too much trouble. c = make_a_cursor_for_my_favoriate_database() f = open( "mydata" ) for line in f.xreadlines(): c.execute( "insert into table( fields) values (%s,%s ... )", line.split() ) c.commit() print "I'm done loading, feel free to hit control+C if you get tired" c.execute( "select * from table order by field" ) while 1: print c.fetchone() Then, from your shell: myloadscript.py | gzip -9 > results.txt Start it up Friday night and take the weekend off. Just make sure you plug your laptop into the wall before you go home. > > Is this reasonnable to do on 10^8 elements with repeats in the keys? I > guess I should just try and see for myself. Repeats in the keys don't matter. Adam DePrince -- http://mail.python.org/mailman/listinfo/python-list