thomasvang...@gmail.com wrote:
Dear Fellow programmers,

I'm using Python scripts too organize some rather large datasets
describing DNA variation. Information is read, processed and written
too a file in a sequential order, like this
1+
1-
2+
2-

etc.. The files that i created contain positional information
(nucleotide position) and some other info, like this:

file 1+:
--------------------------------------------
1       73      0       1       0       0
1       76      1       0       0       0
1       77      0       1       0       0
--------------------------------------------
file 1-
--------------------------------------------
1       74      0       0       6       0
1       78      0       0       4       0
1       89      0       0       0       2

Now the trick is that i want this:

File 1+ AND File 1-
--------------------------------------------
1       73      0       1       0       0
1       74      0       0       6       0
1       76      1       0       0       0
1       77      0       1       0       0
1       78      0       0       4       0
1       89      0       0       0       2
-------------------------------------------

So the information should be sorted onto position. Right now I've
written some very complicated scripts that read a number of lines from
file 1- and 1+ and then combine this output. The problem is of course
that the running number of file 1- can be lower then 1+, resulting in
a incorrect order. Since both files are too large to input in a
dictionary at once (both are 100 MB+) I need some sort of a
alternative that can quickly sort everything without crashing my pc..

Here's my attempt:

line_1 = input_1.readline()
line_2 = input_2.readline()
while line_1 and line_2:
    pos_1 = int(line_1.split(None, 2)[1])
    pos_2 = int(line_2.split(None, 2)[1])
    if pos_1 < pos_2:
        output.write(line_1)
        line_1 = input_1.readline()
    else:
        output.write(line_2)
        line_2 = input_2.readline()
while line_1:
    output.write(line_1)
    line_1 = input_1.readline()
while line_2:
    output.write(line_2)
    line_2 = input_2.readline()

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to