Ethan Merritt wrote:
> Examples include:
> - very large structures, for which the current 80 column PDB format
>  runs out of space for atom numbers (4 columns -> max 9999)
>   or for chain ids (1 column -> single char A-Z 0-9)
>   [don't ask my why they don't want lower case]
> - new classes of experiment (SAXS, EM)
> - new classes of model (TLS or normal-mode displacements,
>   ensemble models, envelope representations)
It would be trivial to update the PDB format to handle large structures.
In fact, such extensions are already being planned. Atom numbers can
simply be handled by truncating them; the serial design of PDB files
makes it redundant.

As for other experiments, like SAX or EM, I only think that the PDB
format should continue to be used for atomic coordinates. Using them as
a complete data reference has never been good.

...
> Currently-maintained programs should move to mmCIF or XML, whichever
> is convenient.  These formats are intrinsically open-ended, and can
> handle the problematic structures mentioned above so long as the
> corresponding mmCIF dictionaries are updated to define the relevant
> entities.
Being intrinsically open-ended is an advantage for parsing, but it still
takes a lot of work to actually make use of new data. The software still
has to be updated to handle the data. Formats like mmCIF and XML only
handle part of the 'file format' issue. One problem is that mmCIF can be
too open-ended, depending on how the schema is managed.

I would be much more willing to work toward switching to mmCIF if RCSB
showed more interest in collaborating with the user community. If we
can't even get involvement in something as simple as the PDB format, why
should we think working with mmCIF will be any better?

Joe Krahn

Reply via email to