On 05/08/13 09:03, Tim Gruene wrote:

having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
(1st August) made me wonder whether it is planned to introduce mmCIF as
working format to users in addition to using it at e.g. the PDB, because
I think that would make life unnecessarily complicated.

There’s nothing to stop you using your /own/ working format—it’s easy to extract a simpler file from the full archive file—but the archive file obviously has to contain the full set of metadata, and to be useful, that metadata has to be easily parsable.


The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
file.
I know that disk space is 'cheap' nowadays, but that does not make it fast.

And personally I find mmCIF very awkward to work with, since it is not
line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
files.
Instead of using mmCIF, one could, e.g. introduce a free format PDB
format, with space holders for non-assigned entities, and maybe a line
continuation character.

Are you sure you’re talking about the CIF‐based mmCIF format here, not the XML‐based PDBx format? mmCIF shouldn’t be much bigger than PDB.

If mmCIF is not going to be the working format for MX (refinement)
programs I would be happy for a reassurance, and otherwise I would
appreciate some comments about the benefits of an XML file format over a
line-oriented free format for the scientists that work with structural data.
I my opinion, using XML (or mmCIF) for structural information is an
attempt of programmers to make themselves more indespensable to
scientists, rather than scientifically needed.

Even when searching the “simple” PDB format, you’re likely to encounter problems with line endings. Imagine trying to find all files containing PEG, your script must reliably recognise something like:

REMARK 280 CRYSTALLIZATION CONDITIONS: 1.0M LITHIUM SULPHATE, 100MM POLY
REMARK 280   ETHYLENE GLYCOL

—in fact this sort of thing is much /easier/ to do, given the proper tools, in a format like XML.

With file formats, the devil is always in the details. If you set out to create a “line‐oriented, free format” PDB replacement, and you carefully ironed out all the potential ambiguities and awkward corner cases, I bet you’d come up with something close to mmCIF.
--
Ian ◎

Reply via email to