Arnaud Delobelle wrote: > nn <prueba...@latinmail.com> writes: > > > On Jan 28, 10:50 am, evilweasel <karthikramaswam...@gmail.com> wrote: > >> I will make my question a little more clearer. I have close to 60,000 > >> lines of the data similar to the one I posted. There are various > >> numbers next to the sequence (this is basically the number of times > >> the sequence has been found in a particular sample). So, I would need > >> to ignore the ones containing '0' and write all other sequences > >> (excluding the number, since it is trivial) in a new text file, in the > >> following format: > >> > >> >seq59902 > >> > >> TTTTTTTATAAAATATATAGT > >> > >> >seq59903 > >> > >> TTTTTTTATTTCTTGGCGTTGT > >> > >> >seq59904 > >> > >> TTTTTTTGGTTGCCCTGCGTGG > >> > >> >seq59905 > >> > >> TTTTTTTGTTTATTTTTGGG > >> > >> The number next to 'seq' is the line number of the sequence. When I > >> run the above program, what I expect is an output file that is similar > >> to the above output but with the ones containing '0' ignored. But, I > >> am getting all the sequences printed in the file. > >> > >> Kindly excuse the 'newbieness' of the program. :) I am hoping to > >> improve in the next few months. Thanks to all those who replied. I > >> really appreciate it. :) > > > > People have already given you some pointers to your problem. In the > > end you will have to "tweak the details" because only you have access > > to the data not us. > > > > Just as example here is another way to do what you are doing: > > > > with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile: > > partgen=(line.split() for line in infile) > > dnagen=(str(i+1)+'\n'+part[0]+'\n' > > for i,part in enumerate(partgen) > > if len(part)>1 and part[1]!='0') > > outfile.writelines(dnagen) > > I think that generator expressions are overrated :) What's wrong with: > > with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile: > for i, line in enumerate(infile): > parts = line.split() > if len(parts) > 1 and parts[1] != '0': > outfile.write(">seq%s\n%s\n" % (i+1, parts[0])) > > (untested) > > -- > Arnaud
Nothing really, After posting I was thinking I should have posted a more straightforward version like the one you wrote. Now there is! It probably is more efficient too. I just have a tendency to think in terms of pipes: "pipe this junk in here, then in here, get output". Probably damage from too much Unix scripting.Since I can't resist the urge to post crazy code here goes the bonus round (don't do this at work): open('dnaout.dat','w').writelines( 'seq%s\n%s\n'%(i+1,part[0]) for i,part in enumerate(line.split() for line in open('dnain.dat')) if len(part)>1 and part[1]!='0') -- http://mail.python.org/mailman/listinfo/python-list