Due to the possibility of insertion codes and non-sequential residue numbering, I believe there is no way to avoid aligning the residues in the ATOM records with the sequence in SEQRES in order to find gaps. I don't know of a program to do this. The structure validation server at RCSB ADIT2 makes this alignment for the depositor to look at, but it would not be easy to include in a script.
If all you want are "accurate fasta sequence" for the the protein there are programs to convert the SEQRES records to a string of one-letter codes. The SEQRES record in principle has the sequence of what is present in the crystal, regardless of whether it is visualized or not. However there cannot be conflicts between the SEQRES and the atom records, so if the structure contains unknown ('UNK') residues, they have to be UNK in the SEQRES also, even if the sequence is known. And if a string of UNK residues is disconnected on both ends, i.e. in the middle of a gap of missing residues, then it is pretty arbitrary which residues in seqres get replaced with UNK. Kelvin Luther wrote: > Hello, > > I am using PyMOL 0.99rc6. I am wondering if there is a means to obtain > a FASTA sequence from a loaded pdb file that maintains the gaps due to > missing portions of the structure? I found one program that will strip > the sequence from pdb files, but it simply reads out the amino acids > that are present linearly. Gaps in the sequence are not maintained. I > can't imagine why that would be useful. I have a large number of pdb > files to deal with and would like to avoid having to align each pdb > sequence to the gene just to recover the gaps. > > Thanks for your time, > ------------------------------------------------------------------------------ _______________________________________________ PyMOL-users mailing list (PyMOL-users@lists.sourceforge.net) Info Page: https://lists.sourceforge.net/lists/listinfo/pymol-users Archives: http://www.mail-archive.com/pymol-users@lists.sourceforge.net