On Jul 16, 4:14 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > Beema shafreen wrote: > > How do I write a regular expression for this kind of sequences > > > >gi|158028609|gb|ABW08583.1| CG8385-PF, isoform F [Drosophila melanogaster] > > MGNVFANLFKGLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVE > > line.split("|") ? > > it's a bit hard to come up with a working RE with only a single sample; > what are the constraints for the different fields? is the last part > free form text or something else, etc. > > have you googled for existing implementations of the format you're using?
That'a a fasta file, so for the header line this is enough: [part.strip() for part in line.split("|")] But better is to use the biopython libs that already perform all such things better. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list