The new PDB format (version 3) has a lot of very useful improvements,
and an update is long overdue. However, I am irate that RCSB chose NOT
to use the ACA meeting to discuss the changes. Instead, the format is
being put into production at the same time as the ACA meeting. It is
essentially stating that opinions expressed at the ACA do not count.
Their was a lot of conflict at their last attempt at an update. Instead
of working to better involve the structural biologist community, I feel
that they are intentionally discounting our interests because working
with the user community is too much effort.

Unfortunately, structural biologists generally do not want to spend time
arguing about file formats, while computer scientists can carry on for
weeks over minor details. This change is going to affect all of us. If
you have concerns about the new format that have not been addressed, it
is important to take action now. The PDB format is not just their
personal database format (that's what mmCIF is for), but the format that
we all use in our daily research. They don't even want to keep the PDB
format at all. It's primary purpose now is for structural biologists. It
is essential that we be part of the decision making process.

I just sent the following letter to the wwPDB, which is where
comments about the new format are supposed to go. If you will be at the
ACA meeting, I encourage you to complain loudly.

Joe Krahn

-----------------------------------------------------------------------
To: [EMAIL PROTECTED]
Subject: The new PDB format is WRONG.

It seems obvious to me that the RCSB and wwPDB worked on the new format
to consider database users needs, but has intentionally ignored the rest
of the user community. RCSB manages mmCIF for database purposes, and has
declared a lack of interest in even keeping the PDB format. Obviously,
the primary purpose of the PDB format is for structural biologists
working with individual structures, and not database users.

Most of the updates are quite positive and beneficial, but I think that
some changes are detrimental. My only serious complaint is that RCSB,
and now wwPDB, seem to be ignoring the interests of much of the
scientific community which they are supposed to be serving. All that I
ask for is appropriate inclusion of all of the user community. This is a
big change that will affect thousands of people. We should ensure that
it is the best possible format update before we all have to expend a
huge effort to deal with it.

I have seen many comments about the format by well known
crystallographers ignored. One example is the use of SegID. Most
structural biologists have favored it for years, but RCSB continued to
deny us, on grounds that it is not "well defined". It would be better to
make a better definition, and allow it to be used to group together
non-covalent groups, such as waters with a specific protein molecule.
This is important because the use of ChainID for non-polymers has been
banned, which also goes against the wishes of most users.

The latest atom alignment rule changes is also detrimental. RCSB has
totally broken the element alignment rules, on baseless grounds that it
was too hard to follow. The new change convolutes this rule even
further, and essentially follows an earlier attempt at IUPAC hydrogen
names that the community strongly rejected. At this point, the best
solution is probably to make it completely left justified. Again, my
main concern is not to follow my idea, but to ensure that the user
community gets a fair chance to participate in the final decision.

Another problem is that the original meaning of HET groups continues to
be corrupted. ATOM records are for commonly occurring residues from a
list of standard residues. Water is obviously common, and should not
have been converted to a HET group. HET groups have NO relation ship to
polymeric state. With water as a HET group, a proper PDB file for a
modeller with bulk solvent would require CONECT entries for every single
water. It is also important to emphasize that the HETNAM is the actual
unique ID, not the 3-letter code. The current hack is to treat
everything as an ATOM, which has a pre-determined connectivity. This
cannot continue forever, and we are already stuck with meaningless
3-letter codes instead of useful 3-letter abbreviations. The unique
3-letter code should be continued for now, but there should be an
emphasis on beginning to use the full HETNAM so that the inevitable
switch top non-unique 3-letter codes will not have a big impact.

Thank you,
Joe Krahn

Reply via email to