Re: [ccp4bb] refining against weak data and Table I stats

Theresa Hsu Thu, 13 Dec 2012 09:50:29 -0800

Being a beginner crystallographer, may I ask a basic question? On how many 
occasions does it make a *biological* difference between having a structure at 
1.42 and 1.6 A? I think this question also extends to adding in water molecules 
just to make statistics look good.


Thank you.

Theresa


On Thu, 13 Dec 2012 10:07:56 -0500, Douglas Theobald <dtheob...@brandeis.edu> 
wrote:

>On Dec 13, 2012, at 1:52 AM, James Holton <jmhol...@lbl.gov> wrote:
>
>[snip]
>
>> So, what I would advise is to refine your model with data out to the 
>> resolution limit defined by CC*, but declare the "resolution of the 
>> structure" to be where the merged I/sigma(I) falls to 2. You might even want 
>> to calculate your Rmerge, Rcryst, Rfree and all the other R values to this 
>> resolution as well, since including a lot of zeroes does nothing but 
>> artificially drive up estimates of relative error.  
>
>So James --- it appears that you basically agree with my proposal?  I.e., 
>
>(1) include all of the data in refinement (at least up to where CC1/2 or CC* 
>is still "significant")
>
>(2) keep the definition of resolution to what is more-or-less the defacto 
>standard (res bin where I/sigI=2), 
>
>(3) report Table I where everything is calculated up to this resolution (where 
>I/sigI=2), and 
>
>(4) maybe include in Supp Mat an additional table that reports statistics for 
>all the data (I'm leaning towards a table with stats for each res bin)
>
>As you argued, and as I argued, this seems to be a good compromise, one that 
>modifies current practice to include weak data, but nevertheless does not 
>change the def of resolution or the Table I stats, so that we can still 
>compare with legacy structures/stats.
>
>
>> Perhaps we should even take a lesson from our "small molecule" friends and 
>> start reporting "R1", where the R factor is computed only for hkls where 
>> I/sigma(I) is above 3?
>> 
>> -James Holton
>> MAD Scientist
>> 
>> On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote:
>>> I too like the idea of reporting the table 1 stats vs resolution
>>> rather than just the overall values and highest resolution shell.
>>> 
>>> I also wanted to point out an earlier thread from April about the
>>> limitations of the PDB's defining the resolution as being that of
>>> the highest resolution reflection (even if data is incomplete or weak).
>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=376289
>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=377673
>>> 
>>> What we have done in the past for cases of low completeness
>>> in the outer shell is to define the nominal resolution ala Bart
>>> Hazes' method of same number of reflections as a complete data set and
>>> use this in the PDB title and describe it in the remark 3 other
>>> refinement remarks.
>>>   There is also the possibility of adding a comment to the PDB
>>> remark 2 which we have not used.
>>> http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202
>>> This should help convince reviewers that you are not trying
>>> to mis-represent the resolution of the structure.
>>> 
>>> 
>>> Regards,
>>> Mitch
>>> 
>>> -----Original Message-----
>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
>>> Edward A. Berry
>>> Sent: Friday, December 07, 2012 8:43 AM
>>> To: CCP4BB@JISCMAIL.AC.UK
>>> Subject: Re: [ccp4bb] refining against weak data and Table I stats
>>> 
>>> Yes, well, actually i'm only a middle author on that paper for a good
>>> reason, but I did encourage Rebecca and Stephan to use all the data.
>>> But on a later, much more modest submission, where the outer shell
>>> was not only weak but very incomplete (edges of the detector),
>>> the reviewers found it difficult to evaluate the quality
>>> of the data (we had also excluded a zone with bad ice-ring
>>> problems). So we provided a second table, cutting off above
>>> the ice ring in the good strong data, which convinced them
>>> that at least it is a decent 2A structure. In the PDB it is
>>> a 1.6A structure. but there was a lot of good data between
>>> the ice ring and 1.6 A.
>>> 
>>> Bart Hazes (I think) suggested a statistic called "effective
>>> resolution" which is the resolution to which a complete dataset
>>> would have the number of reflectionin your dataset, and we
>>> reported this, which came out to something like 1.75.
>>> 
>>> I do like the idea of reporting in multiple shells, not just overall
>>> and highest shell, and the PDB accomodatesthis, even has a GUI
>>> to enter it in the ADIT 2.0 software. It could also be used to
>>> report two different overall ranges, such as completeness, 25 to 1.6 A,
>>> which would be shocking in my case, and 25 to 2.0 which would
>>> be more reassuring.
>>> 
>>> eab
>>> 
>>> Douglas Theobald wrote:
>>>> Hi Ed,
>>>> 
>>>> Thanks for the comments.  So what do you recommend?  Refine against weak 
>>>> data, and report all stats in a single Table I?
>>>> 
>>>> Looking at your latest V-ATPase structure paper, it appears you favor 
>>>> something like that, since you report a high res shell with I/sigI=1.34 
>>>> and Rsym=1.65.
>>>> 
>>>> 
>>>> On Dec 6, 2012, at 7:24 PM, Edward A. Berry<ber...@upstate.edu>  wrote:
>>>> 
>>>>> Another consideration here is your PDB deposition. If the reason for using
>>>>> weak data is to get a better structure, presumably you are going to 
>>>>> deposit
>>>>> the structure using all the data. Then the statistics in the PDB file must
>>>>> reflect the high resolution refinement.
>>>>> 
>>>>> There are I think three places in the PDB file where the resolution is 
>>>>> stated,
>>>>> but i believe they are all required to be the same and to be equal to the
>>>>> highest resolution data used (even if there were only two reflections in 
>>>>> that shell).
>>>>> Rmerge or Rsymm must be reported, and until recently I think they were 
>>>>> not allowed
>>>>> to exceed 1.00 (100% error?).
>>>>> 
>>>>> What are your reviewers going to think if the title of your paper is
>>>>> "structure of protein A at 2.1 A resolution" but they check the PDB file
>>>>> and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
>>>>> in your table 1* says 1.3?
>>>>> 
>>>>> Douglas Theobald wrote:
>>>>>> Hello all,
>>>>>> 
>>>>>> I've followed with interest the discussions here about how we should be 
>>>>>> refining against weak data, e.g. data with I/sigI<<   2 (perhaps using 
>>>>>> all bins that have a "significant" CC1/2 per Karplus and Diederichs 
>>>>>> 2012).  This all makes statistical sense to me, but now I am wondering 
>>>>>> how I should report data and model stats in Table I.
>>>>>> 
>>>>>> Here's what I've come up with: report two Table I's.  For comparability 
>>>>>> to legacy structure stats, report a "classic" Table I, where I call the 
>>>>>> resolution whatever bin I/sigI=2.  Use that as my "high res" bin, with 
>>>>>> high res bin stats reported in parentheses after global stats.   Then 
>>>>>> have another Table (maybe Table I* in supplementary material?) where I 
>>>>>> report stats for the whole dataset, including the weak data I used in 
>>>>>> refinement.  In both tables report CC1/2 and Rmeas.
>>>>>> 
>>>>>> This way, I don't redefine the (mostly) conventional usage of 
>>>>>> "resolution", my Table I can be compared to precedent, I report stats 
>>>>>> for all the data and for the model against all data, and I take 
>>>>>> advantage of the information in the weak data during refinement.
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> Douglas
>>>>>> 
>>>>>> 
>>>>>> ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
>>>>>> Douglas L. Theobald
>>>>>> Assistant Professor
>>>>>> Department of Biochemistry
>>>>>> Brandeis University
>>>>>> Waltham, MA  02454-9110
>>>>>> 
>>>>>> dtheob...@brandeis.edu
>>>>>> http://theobald.brandeis.edu/
>>>>>> 
>>>>>>              ^\
>>>>>>    /`  /^.  / /\
>>>>>>   / / /`/  / . /`
>>>>>> / /  '   '
>>>>>> '
>>>>>> 
>>>>>>

Re: [ccp4bb] refining against weak data and Table I stats

Reply via email to