On 13 May 2005 [EMAIL PROTECTED] wrote:

> Your help has made me realise the problem is more complex than I first
> though though...I've included a small sample of an actual file I need to
> process. The structure is the same as in the full versions though; some
> lowercase, some uppercase, then some more lowercase. One is that I need
> to remove the lines of asterisks.


Hello Chris,

Since this does look like a biologically-driving example, you may want to
make sure that no one else has already done this work for you.  BioPython
is a collections of programs written for bioinformatics work in Python:

    http://biopython.org/

and you may want to just double check to see if someone has already done
the work in parsing that data.


The input that you've show us suggests that you're dealing with ClustalW
sequence alignment data.  If so, you should be aware that a Biopython
parser does exist in the 'Bio.Clustalw' module package:

    http://biopython.org/docs/api/public/Bio.Clustalw-module.html

And if you are willing to look at BioPython, then take a look at section
11.6.2 of:

http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html

for an example of parsing a CLUSTALW file with BioPython.


If you're doing this from scratch to learn Python better, that's great.
But if you're in a hurry, take advantage of the stuff that's out there.


By the way, you may find the tutorial at:

    http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html

to be useful; it's a Python tutorial with a biological focus.



Best of wishes!

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to