Hi Joe,

as both a contributor to the data and a user of the PDB files, I think you 
are a bit harsh in the general extrapolation of your frustration with PDB. 
I have communicated with the PDB folks both at Rutgers
and at EBI extensively about many of the shortcomings in both deposition
and data mining you are describing now. I have been annoyed at times too,
but many things have in fact been fixed. 

But there is, as you note, a very diverse user community, and all have
different opinions of what is important to them and what not. The PDB
has been listening to user comments and some can be addressed readily, 
some would cause clashes with many other users' views.

On top of it, you have to deal with a very sensitive community of
submitters (imagine what outcry would happen if the PDB would do some 
really strict validation and add the caustic remarks that some structures
deserve to the PDB file). One result would be a lot less nature papers,
btw....

The developers are an equally important and opinion-rich group, and by
changing 
anything format related, you generally break quite a few things for them. As
they are key for the science (no programs -> no structures -> no nagging
bioinformaticists)
they need to be part of any view of future developments. 

A general rethinking of the representation of structural information beyond
the format discussion, on an abstract object level is certainly advisable
for
the future. Extensions such as mesh or grid objects can capture much from 
e-density to EM masks to SAX obloids and shapes, and go beyond the beloved 
flat-file atom representation.    

Anyhow, if I understand correctly, your suggestion of polling the user
community,
is on an grander scale already on the mind of the PDB folks. I am sure your
voice will be heard. 

Best regards, BR

-----------------------------------------------------------------
Bernhard Rupp
ACA Data Standards and Computing Committee
001 (925) 209-7429
+43 (676) 571-0536
[EMAIL PROTECTED]
[EMAIL PROTECTED] 
http://www.ruppweb.org/                 
-----------------------------------------------------------------
People can be divided in three classes:
The few who make things happen
The many who watch things happen
And the overwhelming majority 
who have no idea what is happening.
-----------------------------------------------------------------


-----Original Message-----
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Joe
Krahn
Sent: Wednesday, August 01, 2007 5:17 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] PDB format survey?

Ethan Merritt wrote:
> Examples include:
> - very large structures, for which the current 80 column PDB format  
> runs out of space for atom numbers (4 columns -> max 9999)
>   or for chain ids (1 column -> single char A-Z 0-9)
>   [don't ask my why they don't want lower case]
> - new classes of experiment (SAXS, EM)
> - new classes of model (TLS or normal-mode displacements,
>   ensemble models, envelope representations)
It would be trivial to update the PDB format to handle large structures.
In fact, such extensions are already being planned. Atom numbers can simply
be handled by truncating them; the serial design of PDB files makes it
redundant.

As for other experiments, like SAX or EM, I only think that the PDB format
should continue to be used for atomic coordinates. Using them as a complete
data reference has never been good.

...
> Currently-maintained programs should move to mmCIF or XML, whichever 
> is convenient.  These formats are intrinsically open-ended, and can 
> handle the problematic structures mentioned above so long as the 
> corresponding mmCIF dictionaries are updated to define the relevant 
> entities.
Being intrinsically open-ended is an advantage for parsing, but it still
takes a lot of work to actually make use of new data. The software still has
to be updated to handle the data. Formats like mmCIF and XML only handle
part of the 'file format' issue. One problem is that mmCIF can be too
open-ended, depending on how the schema is managed.

I would be much more willing to work toward switching to mmCIF if RCSB
showed more interest in collaborating with the user community. If we can't
even get involvement in something as simple as the PDB format, why should we
think working with mmCIF will be any better?

Joe Krahn

Reply via email to