Need help with a program

2010-01-28 Thread evilweasel
Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AGACTCGAGTGCGCGGA   0
AGATAAGCTAATTAAGCTACTGG 0
AGATAAGCTAATTAAGCTACTGGGTT   1
AGCTCACAATAT 1
AGGTCGCCTGACGGCTGC  0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)


b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)


print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1[i] + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help with a program

2010-01-28 Thread evilweasel
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

>seq59902
TTTATTATATAGT

>seq59903
TTTATTTCTTGGCGTTGT

>seq59904
TTTGGTTGCCCTGCGTGG

>seq59905
TTTGTTTATGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)
-- 
http://mail.python.org/mailman/listinfo/python-list