> I'm using Python scripts too organize some rather large datasets > describing DNA variation. Information is read, processed and written > too a file in a sequential order, like this > 1+ > 1- > 2+ > 2- > > etc.. The files that i created contain positional information > (nucleotide position) and some other info, like this: > > file 1+: > -------------------------------------------- > 1 73 0 1 0 0 > 1 76 1 0 0 0 > 1 77 0 1 0 0 > -------------------------------------------- > file 1- > -------------------------------------------- > 1 74 0 0 6 0 > 1 78 0 0 4 0 > 1 89 0 0 0 2 > > Now the trick is that i want this: > > File 1+ AND File 1- > -------------------------------------------- > 1 73 0 1 0 0 > 1 74 0 0 6 0 > 1 76 1 0 0 0 > 1 77 0 1 0 0 > 1 78 0 0 4 0 > 1 89 0 0 0 2 > ------------------------------------------- > > So the information should be sorted onto position. Right now I've > written some very complicated scripts that read a number of lines from > file 1- and 1+ and then combine this output. The problem is of course > that the running number of file 1- can be lower then 1+, resulting in > a incorrect order. Since both files are too large to input in a > dictionary at once (both are 100 MB+) I need some sort of a > alternative that can quickly sort everything without crashing my pc..
Have you considered using a lightweight database solution? Sqlite is a really simple, zero configuration, server-less db and a python binding for it comes with python itself. I'd give it a try, it will simplify tasks like these a great deal. http://docs.python.org/library/sqlite3.html Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown -- http://mail.python.org/mailman/listinfo/python-list