Tom, the way I have always dealt with this (and the way it is currently handled in Staraniso) is to simply count unmeasured intensities as zero in the averaging of I/sigma(I). This is the same as taking the mean I/sigma(I) for all reflections in a bin = bin completeness x mean I/sigma(I) for measured reflections in the bin, so bins with low completeness are more likely to be cut. This has the clear advantage that you don't need to decide on separate arbitrary criteria for completeness and measured mean I/sigma(I); you just need to decide one (still arbitrary) criterion for the overall mean I/sigma(I). For example, if the bin completeness were only 2%, the mean I/sigma(I) for the measured reflections in that bin would have to be > 50 times the threshold (e.g. > 50x1.5) in order not to cut at that bin, which is extremely unlikely.
This makes sense statistically because whatever the true value of I, because it is obviously unknown for an unmeasured reflection, sigma(I) has to be very large and therefore I/sigma(I) for an unmeasured reflection will be much smaller than that of its measured neighbours, and the I/sigma(I) value won't make a contribution to the mean that is greatly different from zero anyway. I suppose one could have a more sophisticated treatment where I/sigma(I) is estimated from the Wilson prior; however for that one needs to know the absolute scale and anisotropy and those can only be determined _after_ the cut-off has been performed. So one would need a bootstrap process, which would greatly increase the complexity (and failure modes !) of the algorithm. It's not clear to me that the difference in the results would make the effort worthwhile. Unfortunately one can't pull the same trick when using CC_1/2 as the cut-off criterion because a zero intensity has perfect correlation with another zero intensity ! This would cause CC_1/2 to increase at low completeness (remember one is correlating deviations from the mean intensity for the bin, so zeros will have large deviations from the mean and make a big contribution to the CC). This is definitely not what one wants ! For this reason and others, for example the standard significance test for the correlation coefficient assumes homoscedastic (i.e. uniform variance) normally distributed data, but intensity data from area detectors has a Wilson distribution and is always strongly heteroscedastic, unless you somehow contrive to collect the data so that all the sigmas are equal, as I recall was possible with 4-circle diffractometers equipped with a single proportional counter as we had in the 60s-80s. Also CC_1/2 is known to be biased by significant anisotropy, so all-in-all I prefer to use the mean I/sigma(I) criterion. Cheers -- Ian On Sun, 12 Sept 2021 at 04:02, Peat, Tom (Manufacturing, Clayton) <tom.p...@csiro.au> wrote: > Hello Petr, > > I would like to understand more completely your assertion in the last > email regarding completeness: "I would not care about low data > completeness in case when PAIREF shows improvement of your model." > In the papers you gave links to, the data completeness was always 90+% > even in the outer shells. In cases where this is not true, I'm not clear > why completeness would not be important? The ultimate thought experiment, > or extreme case, where one has very few reflections in the resolution > limit, just getting a 'better model' doesn't show me that the structure is > now 1.3 A (or whatever limit one wants to set). Models with no data are > perfect, in the physical sense of not having clashes, Ramachandran > outliers, etc. > As an example, I am aware of a deposition in the PDB where the outer > resolution shell was approximately 2% complete and I don't believe that the > structure is really at the resolution stated as the features 'seen' in > terms of electron density don't really measure up to what I would expect > and the electron density looks a lot more like about 0.5A lower resolution, > where the completeness is a bit better than 50%. > So my 'bias' is that completeness of the data is still an important > feature that needs to be taken into account when forming the basis of > 'resolution limit', but I'm absolutely willing to be shown that my bias is > incorrect. > > Best regards, tom > > Tom Peat, PhD > Proteins Group > Biomedical Program, CSIRO > 343 Royal Parade > Parkville, VIC, 3052 > +613 9662 7304 > +614 57 539 419 > tom.p...@csiro.au > > ------------------------------ > *From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of Petr > Kolenko <petr.kole...@fjfi.cvut.cz> > *Sent:* Sunday, September 12, 2021 5:43 AM > *To:* CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK> > *Subject:* Re: [ccp4bb] criteria to set resolution limit > > Dear Farhan, > Your dataset does not seem to be that critically anisotropic to me. But of > course, try the STARANISO server and make your own decision. > To me, the dataset seems to be collected with a suboptimal data strategy. > Although I do not know your setup, I would make the crystal-to-detector > distance shorter next time. Or maybe rotate a bit more with the crystal? I > do not know the details. > And now, to the point of the resolution. The optimal approach is to try > paired refinement, or even better - paired refinement with the complete > cross-validation protocol. This can be done using program PAIREF that is > easy to be installed to your CCP4 installation by the following commands: > > ccp4-python -m ensurepip --user > ccp4-python -m pip install pairef --no-deps --upgrade --user > > The easiest way to use PAIREF is via GUI. Use the following command: > > ccp4-python -m pairef --gui > > To know more about the program and about the protocol, please read further. > The original work: > https://journals.iucr.org/m/issues/2020/04/00/mf5044/index.html > Upgrade for PHENIX users: > https://scripts.iucr.org/cgi-bin/paper?S2053230X21006129 > > We organized a webinar about the PAIREF about a half year ago. We even > made a video from that. The video covers a short introduction to paired > refinement, installation of PAIREF, and running a test case. > > The link for the webinar is here: > https://pairef.fjfi.cvut.cz/dokuwiki/doku.php?id=webinar_2021-03 > Direct link to the video: > https://pairef.fjfi.cvut.cz/docs/pairef_poli_webinar/PAIREF_webinar_23Mar2021_.mp4 > > I would not care about low data completeness in case when PAIREF shows > improvement of your model. From my point of view, you have the ideal > starting point. Start with the resolution of 1.8AA and verify, whether the > higher shells improve your model. I hope you will be able to make the best > decision, good luck! ;-) And do not hesitate to ask me for more details > about PAIREF. > Best regards, > Petr > > > ________________________________________ > From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of Tushar R. < > rtusha...@gmail.com> > Sent: Saturday, September 11, 2021 6:46:32 PM > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] criteria to set resolution limit > > Along with the paper mentioned by Rajiv, you could look at this paper as > well which discusses a major shift in the understanding of data quality > from I/sig(I) based to CC1/2 based indicators. > > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684713/ > > Hope this helps. > > All the best. > > Best, > Tushar. > > > > On Sat, 11 Sep 2021, 09:36 Rajiv gandhi.s, <raji....@gmail.com<mailto: > raji....@gmail.com>> wrote: > Dear Chang, > One need to set resolution cut off, to have a meaningful data without > losing high resolution data and keeping data integrity. Some key quality > indicators like I/Sigma I, CC 1/2 and Rpim etc., at outer most shell need > to be considered. What was the CC 1/2 value in outer shell ? > > Please refer to the below paper. > How good are my data and what is the resolution > Assessing and maximizing data quality in macromolecular crystallograph > > On Sat, 11 Sep 2021, 9:52 pm Tao-Hsin Chang, <taohsin.ch...@gmail.com > <mailto:taohsin.ch...@gmail.com>> wrote: > Hi Farhan, > > It looks like that your diffraction data has an anisotropic issue and it > leads to the issues of resolution limit, intensity, and completeness. Check > The STARANISO Server ( > https://staraniso.globalphasing.org/cgi-bin/staraniso.cgi). It may be > useful for your case. > > Best wishes, > Tao-Hsin > > On Sep 11, 2021, at 11:55 AM, Syed Farhan Ali <alifarhan...@gmail.com > <mailto:alifarhan...@gmail.com>> wrote: > > Dear All, > > I have query regarding one of my dataset. I am running aimless by keeping > highest resolution 1.62 A and getting I/SigI = 2 but data completeness is > around 22 in outermost shell. And if I am increasing the resolution cutoff > up to 1.8 A then I/SigI is 6.2 and completeness is 82.4. > I have attached the screenshot of the result. > What should be the criteria to set the resolution limit? Should I stick > to I/SigI or I have to consider about the completeness of data. > And if completeness is also a guiding factor than how much minimum > completeness I can keep in the higher resolution shell. > > > > > > Regards, > Farhan > > > > > ________________________________ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > <Screenshot 2021-09-11 at 8.43.25 PM.png><screenshot1.6.tiff> > > > ________________________________ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > ________________________________ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > ________________________________ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > ######################################################################## > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a > mailing list hosted by www.jiscmail.ac.uk, terms & conditions are > available at https://www.jiscmail.ac.uk/policyandsecurity/ > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/