Francesco Pietra <[EMAIL PROTECTED]> wrote: > ATOM 3424 N LEU B 428 143.814 87.271 77.726 1.00115.20 > 2SG3426 > ATOM 3425 CA LEU B 428 142.918 87.524 78.875 1.00115.20 > 2SG3427 [...]
> As you can see, the number of lines for a particular value in column 6 > changes from situation to situation, and may even be different for the > same name in column 4. For example, LEU can have a different number of > lines depending on the position of this amino acid (leucine). Others have alreade given good hints but I would like to add a bit of advice. The data you show appears to be a PDB protein structure file. It is important to realize that these are fixed-width files and columns can be empty so splitting on tab or whithespace will often fail. It is also important to know that the residue numbering (cols 23-26) is not necessarily contiguous and is not even unique without taking into account the 'insertion code' in column 27 which happens to be empty in your example. I would recommend to use a full-blown PDB parser to read the data and then iterate over the residues and do whatever you would like to acomplish that way. Biopython has such a parser: www.biopython.org cu Philipp -- Dr. Philipp Pagel Lehrstuhl f. Genomorientierte Bioinformatik Technische Universität München http://mips.gsf.de/staff/pagel -- http://mail.python.org/mailman/listinfo/python-list