nn <prueba...@latinmail.com> writes: > On Jan 28, 10:50 am, evilweasel <karthikramaswam...@gmail.com> wrote: >> I will make my question a little more clearer. I have close to 60,000 >> lines of the data similar to the one I posted. There are various >> numbers next to the sequence (this is basically the number of times >> the sequence has been found in a particular sample). So, I would need >> to ignore the ones containing '0' and write all other sequences >> (excluding the number, since it is trivial) in a new text file, in the >> following format: >> >> >seq59902 >> >> TTTTTTTATAAAATATATAGT >> >> >seq59903 >> >> TTTTTTTATTTCTTGGCGTTGT >> >> >seq59904 >> >> TTTTTTTGGTTGCCCTGCGTGG >> >> >seq59905 >> >> TTTTTTTGTTTATTTTTGGG >> >> The number next to 'seq' is the line number of the sequence. When I >> run the above program, what I expect is an output file that is similar >> to the above output but with the ones containing '0' ignored. But, I >> am getting all the sequences printed in the file. >> >> Kindly excuse the 'newbieness' of the program. :) I am hoping to >> improve in the next few months. Thanks to all those who replied. I >> really appreciate it. :) > > People have already given you some pointers to your problem. In the > end you will have to "tweak the details" because only you have access > to the data not us. > > Just as example here is another way to do what you are doing: > > with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile: > partgen=(line.split() for line in infile) > dnagen=(str(i+1)+'\n'+part[0]+'\n' > for i,part in enumerate(partgen) > if len(part)>1 and part[1]!='0') > outfile.writelines(dnagen)
I think that generator expressions are overrated :) What's wrong with: with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile: for i, line in enumerate(infile): parts = line.split() if len(parts) > 1 and parts[1] != '0': outfile.write(">seq%s\n%s\n" % (i+1, parts[0])) (untested) -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list