Re: [ccp4bb] PDB format survey?

George M. Sheldrick Fri, 10 Aug 2007 05:13:15 -0700

Dear Ralf,

I like your hybrid_ 36 scheme and will implement it and the two character 
chain IDs PDB file columns 21 and 22, right justified) when I next update 
SHELXL. Of course I will need to do some programming because of the SHELX 
'zero dependency' philosophy, but it seems to me to be straightforward. In 
fact SHELXL treats the residue numbers internally as strings anyway. Since 
I am not in the habit of releasing new versions in a hurry, that will give 
time to make the necessary changes to Coot and MMDB. The great advantage 
of these changes is that for the large majority of PDB files, nothing will 
change.

Although PDB files will be with us for many years to come, there is the 
separate problem of depositing reflection data, for which MMCIF formats on 
the lines indicated by Kim would be a good solution. At the moment people 
who have refined a non-merohedrally twinned structure with SHELXL deposit 
the .fcf file (CIF, not MMCIF!). The data in this file have been 
'detwinned' with the help of the refined structure and so the file can be 
read directly into Coot or used to calculate structure factors as if the 
crystal had not been twinned. Although this will still be needed, we 
should also be depositing the data as measured, so that the twinned 
refinement can be repeated or verified with another program. A similar 
situation arises with time-of-flight neutron diffraction data, e.g. from 
one of the new spallation sources. Incidentally the small molecule CIF 
format also has no answer yet to this problem, maybe the same solution 
could be found for both MMCIF and CIF?!

Best wishes, George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582

On Wed, 8 Aug 2007, Ralf W. Grosse-Kunstleve wrote:

> Hi George,

> It seems to me that column 21 of an ATOM or HETATM instruction
> is always blank, and column 22 is the chain ID. So if we put a
> two-character chain ID right justified in columns 21 and 22, for the
> vast majority of structures there would be no change, and it would
> be relatively easy to change MMDB, Coot etc.  to accomodate the
> increase in the possible number of chains from 26 to (say) 36^2 =
> 1296 (if digits are allowed too).

I really like this idea and I will make phenix/cctbx work this way.

> The next problem is of course the 5-digit atom serial number in columns
> 7 to 11, which limits the total number of atoms in the structure to a
> paltry 99999. Many programs ignore this number, but it is used by the
> CONECT, SSBOND and CISPEP records in the PDB. However column 12 also
> appears to be blank, I think that using it (e.g. with A for atoms
> 100000 to 199999, B for atoms 200000 to 299999 etc.) would enable the
> sequence numbers to be recycled and would again require no change for
> the vast majority of PDB files. I personally think that this is a much
> better solution than what the PDB currently does for more than 99999
> atoms (they spread the structure over several PDB files with different
> PDB-IDs!).

The solution to this problem is to simply treat the serial numbers and
residue numbers as strings. X-PLOR/CNS has been doing this forever,
maybe other programs, too.
Implementations to generate intuitive, maximally backward compatible
numbers can be found here:

  http://cci.lbl.gov/hybrid_36/

This includes a Fortran-77 implementation without any external
dependencies, heavily tested with a large variety of compilers.

I think these simple tricks will be sufficient until we are all
retired!

Cheers,
        Ralf

____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. 
Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469

Re: [ccp4bb] PDB format survey?

Reply via email to