Due to the possibility of insertion codes and non-sequential residue numbering,
I believe there is no way to avoid aligning the residues in the ATOM records
with the sequence in SEQRES in order to find gaps. I don't know of a program
to do this.
The structure validation server at RCSB ADIT2 makes this alignment for the
depositor to look at, but it would not be easy to include in a script.

If all you want are "accurate fasta sequence" for the the protein
there are programs to convert the SEQRES records to a string of
one-letter codes. The SEQRES record in principle has the sequence
of what is present in the crystal, regardless of whether it is
visualized or not. However there cannot be conflicts between the
SEQRES and the atom records, so if the structure contains
unknown ('UNK') residues, they have to be UNK in the SEQRES also,
even if the sequence is known. And if a string of UNK residues is
disconnected on both ends, i.e. in the middle of a gap of missing
residues, then it is pretty arbitrary which residues in seqres get
replaced with UNK.

Kelvin Luther wrote:
> Hello,
>
> I am using PyMOL 0.99rc6.  I am wondering if there is a means to obtain
> a FASTA sequence from a loaded pdb file that maintains the gaps due to
> missing portions of the structure?  I found one program that will strip
> the sequence from pdb files, but it simply reads out the amino acids
> that are present linearly.  Gaps in the sequence are not maintained.  I
> can't imagine why that would be useful.  I have a large number of pdb
> files to deal with and would like to avoid having to align each pdb
> sequence to the gene just to recover the gaps.
>
> Thanks for your time,
>

------------------------------------------------------------------------------
_______________________________________________
PyMOL-users mailing list (PyMOL-users@lists.sourceforge.net)
Info Page: https://lists.sourceforge.net/lists/listinfo/pymol-users
Archives: http://www.mail-archive.com/pymol-users@lists.sourceforge.net

Reply via email to