Re: Renumbering

Philipp Pagel Wed, 03 Sep 2008 03:46:35 -0700

Francesco Pietra <[EMAIL PROTECTED]> wrote:
> ATOM   3424  N   LEU B 428     143.814  87.271  77.726  1.00115.20       
> 2SG3426
> ATOM   3425  CA  LEU B 428     142.918  87.524  78.875  1.00115.20       
> 2SG3427
[...]


> As you can see, the number of lines for a particular value in column 6
> changes from situation to situation, and may even be different for the
> same name in column 4. For example, LEU can have a different number of
> lines depending on the position of this amino acid (leucine).

Others have alreade given good hints but I would like to add a bit of
advice. 

The data you show appears to be a PDB protein structure file. It is
important to realize that these are fixed-width files and columns can be
empty so splitting on tab or whithespace will often fail. It is also
important to know that the residue numbering (cols 23-26) is not
necessarily contiguous and is not even unique without taking into
account the 'insertion code' in column 27 which happens to be empty in
your example. I would recommend to use a full-blown PDB parser to read
the data and then iterate over the residues and do whatever you would
like to acomplish that way. Biopython has such a parser:

www.biopython.org

cu
        Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl f. Genomorientierte Bioinformatik
Technische Universität München
http://mips.gsf.de/staff/pagel
--
http://mail.python.org/mailman/listinfo/python-list

Re: Renumbering

Reply via email to