Re: Organize large DNA txt files

Daniel Fetchinson Fri, 20 Mar 2009 10:00:38 -0700

> I'm using Python scripts too organize some rather large datasets
> describing DNA variation. Information is read, processed and written
> too a file in a sequential order, like this
> 1+
> 1-
> 2+
> 2-
>
> etc.. The files that i created contain positional information
> (nucleotide position) and some other info, like this:
>
> file 1+:
> --------------------------------------------
> 1     73      0       1       0       0
> 1     76      1       0       0       0
> 1     77      0       1       0       0
> --------------------------------------------
> file 1-
> --------------------------------------------
> 1     74      0       0       6       0
> 1     78      0       0       4       0
> 1     89      0       0       0       2
>
> Now the trick is that i want this:
>
> File 1+ AND File 1-
> --------------------------------------------
> 1     73      0       1       0       0
> 1     74      0       0       6       0
> 1     76      1       0       0       0
> 1     77      0       1       0       0
> 1     78      0       0       4       0
> 1     89      0       0       0       2
> -------------------------------------------
>
> So the information should be sorted onto position. Right now I've
> written some very complicated scripts that read a number of lines from
> file 1- and 1+ and then combine this output. The problem is of course
> that the running number of file 1- can be lower then 1+, resulting in
> a incorrect order. Since both files are too large to input in a
> dictionary at once (both are 100 MB+) I need some sort of a
> alternative that can quickly sort everything without crashing my pc..


Have you considered using a lightweight database solution? Sqlite is a
really simple, zero configuration, server-less db and a python binding
for it comes with python itself. I'd give it a try, it will simplify
tasks like these a great deal.

http://docs.python.org/library/sqlite3.html

Cheers,
Daniel


-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown
--
http://mail.python.org/mailman/listinfo/python-list

Re: Organize large DNA txt files

Reply via email to