On Monday 17 March 2008 16:20, Dale Tronrud wrote: > Hi again, > > I guess this is only a partial summary, since I still don't understand > all the issues this question raises. > > Pavel Afonine reported that his extensive tests of the PDB reveals that > reproducing R values from models with TLS ADP's is a wide-spread and > serious problem. The principal problems (IMHO) are > > 1) Incorrect or illegal TLS definitions in the REMARK.
Yes. I have noticed the same. It is unfortunate, and not at all clear at what point the error creeps in. > 2) Some files list in the ATOM "B" column the residual B after TLS > has been accounted for while others list the total B (TLS and > residual). There is no clear indication in the PDB file which > interpretation is being used. That is a fundamental deficiency in the existing PDB standard. It simply doesn't specify how to present this critical information. A draft change covering this was circulated at the PDB get-together of last summer's ACA meeting, and I discussed with Garib and Eleanor that we should as a community decide how we would like it handled. The consensus as I understand it is that people would prefer that the B field of individual ATOM records contain the *net* B rather than the residual B, so that old programs will continue to work as expected. However, this puts even more importance on the TLS description in the header being correct, since the information is otherwise not recoverable. We were going to circulate a letter, but I plead guilty to letting the matter slide. > Tassos, Eleanor, and others recommended taking the TLS definition from > the PDB header and running zero cycles of unrestrained refinement in > Refmac to get it to calculate R factors and Maps w/o the need to define > ideal geometry for co-factors. I have yet to see this work, however > (See below) Well, it has worked reasonably well for me in the past, for some structures. But it may well have broken again. > Ulrich Baumann wrote to tell me of two of his PDB's that he knows will > give back the reported R values. They are 2qua and 2qub. > > I grabbed 2qua from the RCSB server, extracted the TLS groups with CCP4i, > and found that the TLS definitions were incorrect. There is one polypeptide > in this model and three TLS groups. The first and third group did not > have a residue range, while the second group defined a residue range in > the middle of the peptide. I made the assumption that the first and > third TLS groups were intended to cover the beginning and end of the > peptide and corrected the .tls file. That is interesting, because the mmCIF file for that structure contains the following: # _pdbx_refine_tls_group.id 1 _pdbx_refine_tls_group.refine_tls_id 2 _pdbx_refine_tls_group.beg_auth_asym_id A _pdbx_refine_tls_group.beg_auth_seq_id 250 _pdbx_refine_tls_group.beg_label_asym_id A _pdbx_refine_tls_group.beg_label_seq_id 252 _pdbx_refine_tls_group.end_auth_asym_id A _pdbx_refine_tls_group.end_auth_seq_id 461 _pdbx_refine_tls_group.end_label_asym_id A _pdbx_refine_tls_group.end_label_seq_id 463 _pdbx_refine_tls_group.selection ? # This set of records is also a bit mangled, but does seem to contain additional traces of the correct residue ranges for each group. I wonder if the internal PDB database is storing incorrectly formatted XML descriptions of the groups, and then further corrupting the information when it generates a PDB format file? > I also tried entry 2qub, but with less luck. Indeed. That one has no additional information in the mmCIF file either. So I don't know what's up. Here's a recent deposition of ours: 3BJE This one downloads from today's www.pdb.org with full TLS information. So the process clearly works at least some of the time. > I have to close with additional problems, I'm afraid. I can't run > the required refinement on 1nkz to test TLS/B refinement but > I have tried it on 3bsd, where I have a good .cif for the Bchl-a > groups. When I pull out the TLS definition, and perform 10 cycles > of TLS and 10 cycles of restrained refinement I get an R value of > 20.2% while the entry asserts that the correct value is 17.8%. The > final TLS parameters look, by eye, pretty similar to the deposited > ones, so I don't know what is going on here. The issue of proper TLS description is not the only difficulty in reproducing R factors from a PDB file. Another notable omission is the lack of scattering factors. If you have refined a SAS data set, e.g. a Se-edge dataset of a SeMet metallo-protein, then the R factors may vary by >1% just because of incorrectly reproduced f' terms for the Se and metal atoms. Ethan Merritt > I loaded this into Refmac and asked for zero cycles of unrestrained > refinement and got an R value of 19.4%. The PDB file says it should > be 17.3%. I then asked Refmac to run 10 cycles of TLS and 10 cycles > of restrained refinement and got an R value of 17.5%. Good enough. > > From this result I infer that Refmac is unable to calculate the original > ADP's given this PDB file and TLS definition. It can reconstruct them > via refinement, basically ignoring the B values of the PDB file. > > This particular PDB entry appears to contain in its "B" column the > residual B's. > > I also tried entry 2qub, but with less luck. This model has seven > peptides and 30 TLS groups. The first seven TLS groups defined in > the header of the PDB cover each of the seven chains, while the other > 23 groups had no residue range. I can guess that the intension was > to have five TLS groups for each of the seven chains, but without > additional information from Dr. Baumann, I'm unable to even start > trying to reproduce R values and calculate maps. > > So... 1) Pavel is correct, there are many clear errors in the TLS > REMARKs of PDB entries. 2) It seems necessary to ask Refmac to > recreate the ADP description for a PDB entry from scratch, assuming > the TLS group definition can be deduced from the PDB header. This, > currently, requires refinement which requires .cif's for the unusual > groups. > > If CCP4I could ask Refmac to perform only TLS/B refinement, holding > positions fixed, the need for detailed .cifs would be greatly reduced. > I have no desire to move the atoms anyway. > > Better yet, if someone could find out what Refmac is expecting to find > in its starting PDB (what it wants in the "B" column) one could add > a tool to CCP4I that could convert a PDB entry to what Refmac wants > w/o refinement. Since there appear to be two varieties of entries > one could try both possibilities and choose the one with the lowest > R value. > > Dale Tronrud > > > > Dale Tronrud wrote: > > Hi, > > > > I am looking over a number of models from the PDB but have been > > unable to reproduce the R-factors for any model that was refined > > with Refmac and contains TLS parameters. I usually can't get within > > 5% of the reported value. On the other hand, I usually do pretty > > well for models w/o TLS. > > > > An example is the model 1nkz. The PDB header gives an R value > > of 17% but even when I use tlsanal in CCP4i to generate a PDB with > > anisotropic B's that mimic the TLS parameters I get an R value of > > 22.4% using SFCheck. (I'm not implying that I suspect any problem > > with 1nkz, in fact I have every reason to believe this is the great > > model its published stats indicate.) > > > > I've found a CCP4 BB letter that stated that SFCheck does not > > pay attention to anisotropic B's but that letter was dated 2002. > > I hope this limitation has been removed, or at least the output > > would mention this limitation. > > > > Setting up a refinement in Refmac involves a large overhead, > > since even for zero cycles of refinement the program insists on > > a complete stereochemical definition for the strange and wondrous > > groups in this model. I would just like to verify the R factor > > and calculate a proper map for inspection in Coot. Since I have > > many models I would like to look at, I would like a simple procedure. > > > > I did set up a Refmac run for another model, for which I do > > have all the .cif's required, but even after refinement I was not > > close to the reported R. > > > > I see that the models I'm interested in are not present in the > > Electron Density Server, so I suspect I'm not alone in fighting > > this battle. > > > > Any advice would be appreciated, > > Dale Tronrud > -- Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 98195-7742