I like to think structural biologists are more than just another user
group, they FEED the PDB!
Their needs should first and foremost be taken care off, I would think.....
Also, it would indeed be a great loss if legacy programs can not be used
anymore.
- Jeroen -
Clemens Vonrhein wrote:
There is a collection of posts (unfortunately with a number of spam
messages) at
http://wwpdb-remediation.rutgers.edu/mail-archive/
with various comments. Although I'm not familiar with the internal
workings of this remediation program, it seems indeed that the PDB
format is now largely auto-generated from the internally used
mmCIF. Unfortunately in my experience (having had a look at a few dozen
random entries of the new PDB files) this means that some of the new
PDB files of old entries will look very different from what you/we
deposited several years ago. The format seems better (internally
consistent) but the content has sometimes suffered.
But I guess there is always room for frictions when one side is mainly
interested in data format, storage and databases and the other mainly
interested in the crystallographic content. Finding a good compromise
between those two groups of experts is non-trivial.
At least the new databases will always have a link to the original
version of the PDB file - although it will still mean I can't now
search for an author name MUELLER (German U-umlaut transfered in the
proper ASCII format), since the PDB files now contain MULLER (because
PUBMED isn't able to properly translate non-ASCII names ...). Or an
analysis of programs used for structure solution will show a veri
different distribution - since the information has been significantly
changed.
Anyway, have a look at your favourite PDB file with the attached
script
./pdb23.sh 1abc
It is quite interesting sometimes. I haven't cehcked the mmCIF files -
maybe they are much better (as a 'hint' from the database people to
the crystallographers to stop using PDB format and switch to mmCIF,
maybe?).
Cheers
Clemens
On Sat, Jul 21, 2007 at 12:05:35PM -0700, Ethan A Merritt wrote:
On Saturday 21 July 2007 11:12, Joe Krahn wrote:
we all use in our daily research. They don't even want to keep the PDB
format at all. It's primary purpose now is for structural biologists.
That is inevitable. The PDB format is simply not capable of representing
the complexities of current crystallographic models, and will only become
more obsolete as the state of the art progresses. Because it is so wide-
spread, it will remain a legacy format for import/export into programs
that are not up to the current crystallographic state of the art. Yes,
that means it will largely be used by non-crystallographers to import
and view structures.
Thus I think the writing is on the wall that the PDB format as a primary
working medium in crystallography is on its deathbed. Of course it may
linger there for a long while yet, and may be poked at from time to time
in order to stave off its final expiration.
Having said that, I don't understand the motivation for changing this
legacy format to something that the legacy programs will not recognize.
That indeed seems self-defeating.
Ethan Merritt
The new PDB format (version 3) has a lot of very useful improvements,
and an update is long overdue. However, I am irate that RCSB chose NOT
to use the ACA meeting to discuss the changes. Instead, the format is
being put into production at the same time as the ACA meeting. It is
essentially stating that opinions expressed at the ACA do not count.
Their was a lot of conflict at their last attempt at an update. Instead
of working to better involve the structural biologist community, I feel
that they are intentionally discounting our interests because working
with the user community is too much effort.
Unfortunately, structural biologists generally do not want to spend time
arguing about file formats, while computer scientists can carry on for
weeks over minor details. This change is going to affect all of us. If
you have concerns about the new format that have not been addressed, it
is important to take action now. The PDB format is not just their
personal database format (that's what mmCIF is for), but the format that
we all use in our daily research. They don't even want to keep the PDB
format at all. It's primary purpose now is for structural biologists. It
is essential that we be part of the decision making process.
I just sent the following letter to the wwPDB, which is where
comments about the new format are supposed to go. If you will be at the
ACA meeting, I encourage you to complain loudly.
Joe Krahn
-----------------------------------------------------------------------
To: [EMAIL PROTECTED]
Subject: The new PDB format is WRONG.
It seems obvious to me that the RCSB and wwPDB worked on the new format
to consider database users needs, but has intentionally ignored the rest
of the user community. RCSB manages mmCIF for database purposes, and has
declared a lack of interest in even keeping the PDB format. Obviously,
the primary purpose of the PDB format is for structural biologists
working with individual structures, and not database users.
Most of the updates are quite positive and beneficial, but I think that
some changes are detrimental. My only serious complaint is that RCSB,
and now wwPDB, seem to be ignoring the interests of much of the
scientific community which they are supposed to be serving. All that I
ask for is appropriate inclusion of all of the user community. This is a
big change that will affect thousands of people. We should ensure that
it is the best possible format update before we all have to expend a
huge effort to deal with it.
I have seen many comments about the format by well known
crystallographers ignored. One example is the use of SegID. Most
structural biologists have favored it for years, but RCSB continued to
deny us, on grounds that it is not "well defined". It would be better to
make a better definition, and allow it to be used to group together
non-covalent groups, such as waters with a specific protein molecule.
This is important because the use of ChainID for non-polymers has been
banned, which also goes against the wishes of most users.
The latest atom alignment rule changes is also detrimental. RCSB has
totally broken the element alignment rules, on baseless grounds that it
was too hard to follow. The new change convolutes this rule even
further, and essentially follows an earlier attempt at IUPAC hydrogen
names that the community strongly rejected. At this point, the best
solution is probably to make it completely left justified. Again, my
main concern is not to follow my idea, but to ensure that the user
community gets a fair chance to participate in the final decision.
Another problem is that the original meaning of HET groups continues to
be corrupted. ATOM records are for commonly occurring residues from a
list of standard residues. Water is obviously common, and should not
have been converted to a HET group. HET groups have NO relation ship to
polymeric state. With water as a HET group, a proper PDB file for a
modeller with bulk solvent would require CONECT entries for every single
water. It is also important to emphasize that the HETNAM is the actual
unique ID, not the 3-letter code. The current hack is to treat
everything as an ATOM, which has a pre-determined connectivity. This
cannot continue forever, and we are already stuck with meaningless
3-letter codes instead of useful 3-letter abbreviations. The unique
3-letter code should be continued for now, but there should be an
emphasis on beginning to use the full HETNAM so that the inevitable
switch top non-unique 3-letter codes will not have a big impact.
Thank you,
Joe Krahn
--
Ethan A Merritt
Biomolecular Structure Center
University of Washington, Seattle 98195-7742
--
Jeroen Raymundus Mesters, Ph.D.
Institut fuer Biochemie, Universitaet zu Luebeck
Zentrum fuer Medizinische Struktur und Zellbiologie
Ratzeburger Allee 160, D-23538 Luebeck
Tel: +49-451-5004070, Fax: +49-451-5004068
E-mail: [EMAIL PROTECTED]
Http://www.biochem.uni-luebeck.de
Http://www.iobcr.org
Http://www.opticryst.org
--
If you can look into the seeds of time and say
which grain will grow and which will not - speak then to me (Macbeth)
--