On Jan 28, 12:28 pm, Steven Howe <howe.ste...@gmail.com> wrote: > On 01/28/2010 09:49 AM, Jean-Michel Pichavant wrote: > > > > > > > evilweasel wrote: > >> I will make my question a little more clearer. I have close to 60,000 > >> lines of the data similar to the one I posted. There are various > >> numbers next to the sequence (this is basically the number of times > >> the sequence has been found in a particular sample). So, I would need > >> to ignore the ones containing '0' and write all other sequences > >> (excluding the number, since it is trivial) in a new text file, in the > >> following format: > > >>> seq59902 > >> TTTTTTTATAAAATATATAGT > > >>> seq59903 > >> TTTTTTTATTTCTTGGCGTTGT > > >>> seq59904 > >> TTTTTTTGGTTGCCCTGCGTGG > > >>> seq59905 > >> TTTTTTTGTTTATTTTTGGG > > >> The number next to 'seq' is the line number of the sequence. When I > >> run the above program, what I expect is an output file that is similar > >> to the above output but with the ones containing '0' ignored. But, I > >> am getting all the sequences printed in the file. > > >> Kindly excuse the 'newbieness' of the program. :) I am hoping to > >> improve in the next few months. Thanks to all those who replied. I > >> really appreciate it. :) > > Using regexp may increase readability (if you are familiar with it). > > What about > > > import re > > > output = open("sequences1.txt", 'w') > > > for index, line in enumerate(open(sys.argv[1], 'r')): > > match = re.match('(?P<sequence>[GATC]+)\s+1') > > if match: > > output.write('seq%s\n%s\n' % (index, match.group('sequence'))) > > > Jean-Michel > > Finally! > > After ready 8 or 9 messages about find a line ending with '1', someone > suggests Regex. > It was my first thought.
And as a first thought, it is, of course, wrong. You don't want lines ending in '1', you want ANY non-'0' amount. Likewise, you don't want to exclude lines ending in '0' because you'll end up excluding counts of 10, 20, 30, etc. You need a regex that extracts ALL the numeric characters at the end of the line and exclude those that evaluate to 0. > > Steven -- http://mail.python.org/mailman/listinfo/python-list