Being a beginner crystallographer, may I ask a basic question? On how many occasions does it make a *biological* difference between having a structure at 1.42 and 1.6 A? I think this question also extends to adding in water molecules just to make statistics look good.
Thank you. Theresa On Thu, 13 Dec 2012 10:07:56 -0500, Douglas Theobald <dtheob...@brandeis.edu> wrote: >On Dec 13, 2012, at 1:52 AM, James Holton <jmhol...@lbl.gov> wrote: > >[snip] > >> So, what I would advise is to refine your model with data out to the >> resolution limit defined by CC*, but declare the "resolution of the >> structure" to be where the merged I/sigma(I) falls to 2. You might even want >> to calculate your Rmerge, Rcryst, Rfree and all the other R values to this >> resolution as well, since including a lot of zeroes does nothing but >> artificially drive up estimates of relative error. > >So James --- it appears that you basically agree with my proposal? I.e., > >(1) include all of the data in refinement (at least up to where CC1/2 or CC* >is still "significant") > >(2) keep the definition of resolution to what is more-or-less the defacto >standard (res bin where I/sigI=2), > >(3) report Table I where everything is calculated up to this resolution (where >I/sigI=2), and > >(4) maybe include in Supp Mat an additional table that reports statistics for >all the data (I'm leaning towards a table with stats for each res bin) > >As you argued, and as I argued, this seems to be a good compromise, one that >modifies current practice to include weak data, but nevertheless does not >change the def of resolution or the Table I stats, so that we can still >compare with legacy structures/stats. > > >> Perhaps we should even take a lesson from our "small molecule" friends and >> start reporting "R1", where the R factor is computed only for hkls where >> I/sigma(I) is above 3? >> >> -James Holton >> MAD Scientist >> >> On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote: >>> I too like the idea of reporting the table 1 stats vs resolution >>> rather than just the overall values and highest resolution shell. >>> >>> I also wanted to point out an earlier thread from April about the >>> limitations of the PDB's defining the resolution as being that of >>> the highest resolution reflection (even if data is incomplete or weak). >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=376289 >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=377673 >>> >>> What we have done in the past for cases of low completeness >>> in the outer shell is to define the nominal resolution ala Bart >>> Hazes' method of same number of reflections as a complete data set and >>> use this in the PDB title and describe it in the remark 3 other >>> refinement remarks. >>> There is also the possibility of adding a comment to the PDB >>> remark 2 which we have not used. >>> http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 >>> This should help convince reviewers that you are not trying >>> to mis-represent the resolution of the structure. >>> >>> >>> Regards, >>> Mitch >>> >>> -----Original Message----- >>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of >>> Edward A. Berry >>> Sent: Friday, December 07, 2012 8:43 AM >>> To: CCP4BB@JISCMAIL.AC.UK >>> Subject: Re: [ccp4bb] refining against weak data and Table I stats >>> >>> Yes, well, actually i'm only a middle author on that paper for a good >>> reason, but I did encourage Rebecca and Stephan to use all the data. >>> But on a later, much more modest submission, where the outer shell >>> was not only weak but very incomplete (edges of the detector), >>> the reviewers found it difficult to evaluate the quality >>> of the data (we had also excluded a zone with bad ice-ring >>> problems). So we provided a second table, cutting off above >>> the ice ring in the good strong data, which convinced them >>> that at least it is a decent 2A structure. In the PDB it is >>> a 1.6A structure. but there was a lot of good data between >>> the ice ring and 1.6 A. >>> >>> Bart Hazes (I think) suggested a statistic called "effective >>> resolution" which is the resolution to which a complete dataset >>> would have the number of reflectionin your dataset, and we >>> reported this, which came out to something like 1.75. >>> >>> I do like the idea of reporting in multiple shells, not just overall >>> and highest shell, and the PDB accomodatesthis, even has a GUI >>> to enter it in the ADIT 2.0 software. It could also be used to >>> report two different overall ranges, such as completeness, 25 to 1.6 A, >>> which would be shocking in my case, and 25 to 2.0 which would >>> be more reassuring. >>> >>> eab >>> >>> Douglas Theobald wrote: >>>> Hi Ed, >>>> >>>> Thanks for the comments. So what do you recommend? Refine against weak >>>> data, and report all stats in a single Table I? >>>> >>>> Looking at your latest V-ATPase structure paper, it appears you favor >>>> something like that, since you report a high res shell with I/sigI=1.34 >>>> and Rsym=1.65. >>>> >>>> >>>> On Dec 6, 2012, at 7:24 PM, Edward A. Berry<ber...@upstate.edu> wrote: >>>> >>>>> Another consideration here is your PDB deposition. If the reason for using >>>>> weak data is to get a better structure, presumably you are going to >>>>> deposit >>>>> the structure using all the data. Then the statistics in the PDB file must >>>>> reflect the high resolution refinement. >>>>> >>>>> There are I think three places in the PDB file where the resolution is >>>>> stated, >>>>> but i believe they are all required to be the same and to be equal to the >>>>> highest resolution data used (even if there were only two reflections in >>>>> that shell). >>>>> Rmerge or Rsymm must be reported, and until recently I think they were >>>>> not allowed >>>>> to exceed 1.00 (100% error?). >>>>> >>>>> What are your reviewers going to think if the title of your paper is >>>>> "structure of protein A at 2.1 A resolution" but they check the PDB file >>>>> and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but >>>>> in your table 1* says 1.3? >>>>> >>>>> Douglas Theobald wrote: >>>>>> Hello all, >>>>>> >>>>>> I've followed with interest the discussions here about how we should be >>>>>> refining against weak data, e.g. data with I/sigI<< 2 (perhaps using >>>>>> all bins that have a "significant" CC1/2 per Karplus and Diederichs >>>>>> 2012). This all makes statistical sense to me, but now I am wondering >>>>>> how I should report data and model stats in Table I. >>>>>> >>>>>> Here's what I've come up with: report two Table I's. For comparability >>>>>> to legacy structure stats, report a "classic" Table I, where I call the >>>>>> resolution whatever bin I/sigI=2. Use that as my "high res" bin, with >>>>>> high res bin stats reported in parentheses after global stats. Then >>>>>> have another Table (maybe Table I* in supplementary material?) where I >>>>>> report stats for the whole dataset, including the weak data I used in >>>>>> refinement. In both tables report CC1/2 and Rmeas. >>>>>> >>>>>> This way, I don't redefine the (mostly) conventional usage of >>>>>> "resolution", my Table I can be compared to precedent, I report stats >>>>>> for all the data and for the model against all data, and I take >>>>>> advantage of the information in the weak data during refinement. >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Douglas >>>>>> >>>>>> >>>>>> ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` >>>>>> Douglas L. Theobald >>>>>> Assistant Professor >>>>>> Department of Biochemistry >>>>>> Brandeis University >>>>>> Waltham, MA 02454-9110 >>>>>> >>>>>> dtheob...@brandeis.edu >>>>>> http://theobald.brandeis.edu/ >>>>>> >>>>>> ^\ >>>>>> /` /^. / /\ >>>>>> / / /`/ / . /` >>>>>> / / ' ' >>>>>> ' >>>>>> >>>>>>