On Dec 13, 2012, at 1:52 AM, James Holton <jmhol...@lbl.gov> wrote: [snip]
> So, what I would advise is to refine your model with data out to the > resolution limit defined by CC*, but declare the "resolution of the > structure" to be where the merged I/sigma(I) falls to 2. You might even want > to calculate your Rmerge, Rcryst, Rfree and all the other R values to this > resolution as well, since including a lot of zeroes does nothing but > artificially drive up estimates of relative error. So James --- it appears that you basically agree with my proposal? I.e., (1) include all of the data in refinement (at least up to where CC1/2 or CC* is still "significant") (2) keep the definition of resolution to what is more-or-less the defacto standard (res bin where I/sigI=2), (3) report Table I where everything is calculated up to this resolution (where I/sigI=2), and (4) maybe include in Supp Mat an additional table that reports statistics for all the data (I'm leaning towards a table with stats for each res bin) As you argued, and as I argued, this seems to be a good compromise, one that modifies current practice to include weak data, but nevertheless does not change the def of resolution or the Table I stats, so that we can still compare with legacy structures/stats. > Perhaps we should even take a lesson from our "small molecule" friends and > start reporting "R1", where the R factor is computed only for hkls where > I/sigma(I) is above 3? > > -James Holton > MAD Scientist > > On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote: >> I too like the idea of reporting the table 1 stats vs resolution >> rather than just the overall values and highest resolution shell. >> >> I also wanted to point out an earlier thread from April about the >> limitations of the PDB's defining the resolution as being that of >> the highest resolution reflection (even if data is incomplete or weak). >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=376289 >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=377673 >> >> What we have done in the past for cases of low completeness >> in the outer shell is to define the nominal resolution ala Bart >> Hazes' method of same number of reflections as a complete data set and >> use this in the PDB title and describe it in the remark 3 other >> refinement remarks. >> There is also the possibility of adding a comment to the PDB >> remark 2 which we have not used. >> http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 >> This should help convince reviewers that you are not trying >> to mis-represent the resolution of the structure. >> >> >> Regards, >> Mitch >> >> -----Original Message----- >> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Edward >> A. Berry >> Sent: Friday, December 07, 2012 8:43 AM >> To: CCP4BB@JISCMAIL.AC.UK >> Subject: Re: [ccp4bb] refining against weak data and Table I stats >> >> Yes, well, actually i'm only a middle author on that paper for a good >> reason, but I did encourage Rebecca and Stephan to use all the data. >> But on a later, much more modest submission, where the outer shell >> was not only weak but very incomplete (edges of the detector), >> the reviewers found it difficult to evaluate the quality >> of the data (we had also excluded a zone with bad ice-ring >> problems). So we provided a second table, cutting off above >> the ice ring in the good strong data, which convinced them >> that at least it is a decent 2A structure. In the PDB it is >> a 1.6A structure. but there was a lot of good data between >> the ice ring and 1.6 A. >> >> Bart Hazes (I think) suggested a statistic called "effective >> resolution" which is the resolution to which a complete dataset >> would have the number of reflectionin your dataset, and we >> reported this, which came out to something like 1.75. >> >> I do like the idea of reporting in multiple shells, not just overall >> and highest shell, and the PDB accomodatesthis, even has a GUI >> to enter it in the ADIT 2.0 software. It could also be used to >> report two different overall ranges, such as completeness, 25 to 1.6 A, >> which would be shocking in my case, and 25 to 2.0 which would >> be more reassuring. >> >> eab >> >> Douglas Theobald wrote: >>> Hi Ed, >>> >>> Thanks for the comments. So what do you recommend? Refine against weak >>> data, and report all stats in a single Table I? >>> >>> Looking at your latest V-ATPase structure paper, it appears you favor >>> something like that, since you report a high res shell with I/sigI=1.34 and >>> Rsym=1.65. >>> >>> >>> On Dec 6, 2012, at 7:24 PM, Edward A. Berry<ber...@upstate.edu> wrote: >>> >>>> Another consideration here is your PDB deposition. If the reason for using >>>> weak data is to get a better structure, presumably you are going to deposit >>>> the structure using all the data. Then the statistics in the PDB file must >>>> reflect the high resolution refinement. >>>> >>>> There are I think three places in the PDB file where the resolution is >>>> stated, >>>> but i believe they are all required to be the same and to be equal to the >>>> highest resolution data used (even if there were only two reflections in >>>> that shell). >>>> Rmerge or Rsymm must be reported, and until recently I think they were not >>>> allowed >>>> to exceed 1.00 (100% error?). >>>> >>>> What are your reviewers going to think if the title of your paper is >>>> "structure of protein A at 2.1 A resolution" but they check the PDB file >>>> and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but >>>> in your table 1* says 1.3? >>>> >>>> Douglas Theobald wrote: >>>>> Hello all, >>>>> >>>>> I've followed with interest the discussions here about how we should be >>>>> refining against weak data, e.g. data with I/sigI<< 2 (perhaps using >>>>> all bins that have a "significant" CC1/2 per Karplus and Diederichs >>>>> 2012). This all makes statistical sense to me, but now I am wondering >>>>> how I should report data and model stats in Table I. >>>>> >>>>> Here's what I've come up with: report two Table I's. For comparability >>>>> to legacy structure stats, report a "classic" Table I, where I call the >>>>> resolution whatever bin I/sigI=2. Use that as my "high res" bin, with >>>>> high res bin stats reported in parentheses after global stats. Then >>>>> have another Table (maybe Table I* in supplementary material?) where I >>>>> report stats for the whole dataset, including the weak data I used in >>>>> refinement. In both tables report CC1/2 and Rmeas. >>>>> >>>>> This way, I don't redefine the (mostly) conventional usage of >>>>> "resolution", my Table I can be compared to precedent, I report stats for >>>>> all the data and for the model against all data, and I take advantage of >>>>> the information in the weak data during refinement. >>>>> >>>>> Thoughts? >>>>> >>>>> Douglas >>>>> >>>>> >>>>> ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` >>>>> Douglas L. Theobald >>>>> Assistant Professor >>>>> Department of Biochemistry >>>>> Brandeis University >>>>> Waltham, MA 02454-9110 >>>>> >>>>> dtheob...@brandeis.edu >>>>> http://theobald.brandeis.edu/ >>>>> >>>>> ^\ >>>>> /` /^. / /\ >>>>> / / /`/ / . /` >>>>> / / ' ' >>>>> ' >>>>> >>>>>