Re: Regular expression

bearophileHUGS Wed, 16 Jul 2008 09:17:32 -0700

On Jul 16, 4:14 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> Beema shafreen wrote:
> > How do I write a regular expression for this kind of sequences
>
> >  >gi|158028609|gb|ABW08583.1| CG8385-PF, isoform F [Drosophila melanogaster]
> > MGNVFANLFKGLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVE
>
> line.split("|") ?
>
> it's a bit hard to come up with a working RE with only a single sample;
> what are the constraints for the different fields?  is the last part
> free form text or something else, etc.
>
> have you googled for existing implementations of the format you're using?


That'a a fasta file, so for the header line this is enough:
[part.strip() for part in line.split("|")]
But better is to use the biopython libs that already perform all such
things better.

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression

Reply via email to