Dear Randy Yes this makes sense. Certainly cut offs are bad – I hope I my post wasn’t implying one should cut off the data at some particular resolution shell. Some reflections in a shell will be weak and some stronger. Knowing which are which is of course information. I will have a look at the 2019 CCP4 study weekend paper Regards Colin
From: Randy Read <rj...@cam.ac.uk> Sent: 20 February 2020 11:45 To: Nave, Colin (DLSLtd,RAL,LSCI) <colin.n...@diamond.ac.uk> Cc: CCP4BB@jiscmail.ac.uk Subject: Re: [ccp4bb] [3dem] Which resolution? Dear Colin, Over the last few years we've been implementing measures of information gain to evaluate X-ray diffraction data in our program Phaser. Some results in a paper that has been accepted for publication in the 2019 CCP4 Study Weekend special issue are relevant to this discussion. First, looking at data deposited in the PDB, we see that the information gain in the highest resolution shell is typically about 0.5-1 bit per reflection (though we haven't done a comprehensive analysis yet). A very rough calculation suggests that a half-bit resolution threshold is equivalent to something like an I/SIGI threshold of one. So that would fit with the idea that a possible resolution limit measure would be the resolution where the average information per reflection drops to half a bit. Second, even if the half-bit threshold is where the data are starting to contribute less to the image and to likelihood targets for tasks like molecular replacement and refinement, weaker data still contribute some useful signal down to limits as low as 0.01 bit per reflection. So any number attached to the nominal resolution of a data set should not necessarily be applied as a resolution cutoff, at least as long as the refinement target (such as our log-likelihood-gain on intensity or LLGI score) accounts properly for large measurement errors. Best wishes, Randy On 20 Feb 2020, at 10:15, Nave, Colin (DLSLtd,RAL,LSCI) <colin.n...@diamond.ac.uk<mailto:colin.n...@diamond.ac.uk>> wrote: Dear all, I have received a request to clarify what I mean by threshold in my contribution of 17 Feb below and then post the clarification on CCP4BB. Being a loyal (but very sporadic) CCP4BBer I am now doing this. My musings in this thread are as much auto-didactic as didactic. In other words I am trying to understand it all myself. Accepting that the FSC is a suitable metric (I believe it is) I think the most useful way of explaining the concept of the threshold is to refer to section 4.2 and fig. 4 of Heel and Schatz (2005), Journal of Structural Biology, 151, 250-262. Figure 4C show an FSC together with a half bit information curve and figure 4D shows the FSC with a 3sigma curve. The point I was trying to make in rather an obtuse fashion is that the choice of threshold will depend on what one is trying to see in the image. I will try and give an example related to protein structures rather than uranium hydride or axons in the brain. In general protein structures consist of atoms with similar scattering power (C, N, O with the hydrogens for the moment invisible) and high occupancy. When we can for example distinguish side chains along the backbone we have a good basis for starting to interpret the map as a particular structure. An FSC with a half bit threshold at the appropriate resolution appears to be a good guide to whether one can do this. However, if a particular sidechain is disordered with 2 conformations, or a substrate is only 50% occupied, the contribution in the electron density map is reduced and might be difficult to distinguish from the noise. A higher threshold might be necessary to see these atoms but this would occur at a lower resolution than given by the half bit threshold. One could instead increase the exposure to improve the resolution but of course radiation damage lurks. For reporting structures, the obvious thing to do is to show the complete FSC curves together with a few threshold curves (e.g. half bit, one bit, 2 bits). This would enable people to judge whether the data is likely to meet their requirements. This of course departs significantly from the desire to have one number. A compromise might be to report FSC resolutions at several thresholds. I understand that fixed value thresholds (e.g. 0.143) were originally adopted for EM to conform to standards prevalent for crystallography at the time. This would have enabled comparison between the two techniques. For many cases (as stated in Heel and Schatz) there will be little difference between the resolution given by a half bit and that given by 0.143. However, if the former is mathematically correct and easy to implement then why not use it for all techniques? The link to Shannon is a personal reason I have for preferring a threshold based on information content. If I had scientific “heroes” he would be one of them. I have recently had a paper on x-ray imaging of biological cells accepted for publication. This includes “In order to compare theory or simulations with experiment, standard methods of reporting results covering parameters such as the feature examined (e.g. which cellular organelle), resolution, contrast, depth of material (for 2D), estimate of noise and dose should be encouraged. Much effort has gone in to doing this for fields such as macromolecular crystallography but it has to be admitted that this is still an ongoing process.” I think recent activity agrees with the last 6 words! Don’t read the next bit if not interested in the relationship between the Rose criterion and FSC thresholds. The recently submitted paper also includes “A proper analysis of the relationship between the Rose criterion and FSC thresholds is outside the scope of this paper and would need to take account of factors such as the number of image voxels, whether one is in an atomicity or uniform voxel regime and the contrast of features to be identified in the image.” This can justifiably be interpreted as saying I did not fully understand the relationship itself and was a partial reason why I raised the issue in another message to this thread. Who cares anyway about the headline resolution? Well, defining a resolution can be important if one wants to calculate the exposure required to see particular features and whether they are then degraded by radiation damage. This relates to the issue I raised concerning the Rose criterion. As an example one might have a virus particle with an average density of 1.1 embedded in an object (a biological cell) of density 1.0 (I am keeping the numbers simple). The virus has a diameter of 50nm. There are 5000 voxels in the image (the number 5000 was used by Rose when analysing images from televisions). This gives 5000 chances of a false alarm so, I want to ensure the signal to noise ratio in the image is sufficiently high. This is why Rose adopted a contrast to noise ratio of 5 (Rose criterion K of 5). For each voxel in the image we need a noise level sufficiently low to identify the feature. For a Rose criterion of 5 and the contrast of 0.1 it means that we need an average (?) of 625 photons per Shannon reciprocal voxel (the “speckle” given by the object as a whole) at the required resolution (1/50nm) in order to achieve this. The expression for the required number of photons is (K/2C)**2. However, if we have already identified a candidate voxel for the virus (perhaps using labelled fluorescent methods) we can get away with a Rose criterion of 3 (equivalent to K=5 over 5000 pixels) and 225 photons will suffice. For this case, a signal to noise ratio of 3 corresponds to a 0.0027 probability of the event occurring due to Random noise. The information content is therefore –log20.0027 which is 8.5 bits. I therefore have a real space information content of 8.5 bits and an average 225 photons at the resolution limit. The question is to relate these and come up with the appropriate value for the FSC threshold so I can judge whether a particle with this low contrast can be identified. In the above example, the object (biological cell) as a whole has a defined boundary and forms a natural sharp edged mask. The hard edge mask ( see Heel and Schatz section 4.7) is therefore present. I am sure Marin (or others) will put me right of there are mistakes in the above. Finally, for those interested in the relationship between information content and probability the article by Weaver (one of Shannon’s collaborators) gives a non-mathematical and perhaps philosophical description. It can be found at http://www.mt-archive.info/50/SciAm-1949-Weaver.pdf Sorry for the long reply – but at least some of it was requested! Colin From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>> On Behalf Of colin.n...@diamond.ac.uk<mailto:colin.n...@diamond.ac.uk> Sent: 17 February 2020 11:26 To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK> Subject: [ccp4bb] FW: [ccp4bb] [3dem] Which resolution? Dear all. Would it help to separate out the issue of the FSC from the value of the threshold? My understanding is that the FSC addresses the spatial frequency at which there is a reliable information content in the image. This concept should apply to a wide variety of types of image. The issue is then what value of the threshold to use. For interpretation of protein structures (whether by x-ray or electron microscopy), a half bit threshold appears to be appropriate. However, for imaging the human brain (one of Marin’s examples) a higher threshold might be adopted as a range of contrasts might be present (axons for example have a similar density to the surroundings). For crystallography, if one wants to see lighter atoms (hydrogens in the presence of uranium or in proteins) a higher threshold might also be appropriate. I am not sure about this to be honest as a 2 bit threshold (for example) would mean that there is information to higher resolution at a threshold of a half bit (unless one is at a diffraction or instrument limited resolution). Most CCP4BBers will understand that a single number is not good enough. However, many users of the protein structure databases will simply search for the structure with the highest named resolution. It might be difficult to send these users to re-education camps. Regards Colin From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>> On Behalf Of Petrus Zwart Sent: 16 February 2020 21:50 To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK> Subject: Re: [ccp4bb] [3dem] Which resolution? Hi All, How is the 'correct' resolution estimation related to the estimated error on some observed hydrogen bond length of interest, or an error on the estimated occupancy of a ligand or conformation or anything else that has structural significance? In crystallography, it isn't really (only in some very approximate fashion), and I doubt that in EM there is something to that effect. If you want to use the resolution to get a gut feeling on how your maps look and how your data behaves, it doesn't really matter what standard you use, as long as you are consistent in the use of the metric you use. If you want to use this estimate to get to uncertainties of model parameters, you better try something else. Regards Peter Zwart On Sun, Feb 16, 2020 at 8:38 AM Marin van Heel <0000057a89ab08a1-dmarc-requ...@jiscmail.ac.uk<mailto:0000057a89ab08a1-dmarc-requ...@jiscmail.ac.uk>> wrote: Dear Pawel and All others .... This 2010 review is - unfortunately - largely based on the flawed statistics I mentioned before, namely on the a priori assumption that the inner product of a signal vector and a noise vector are ZERO (an orthogonality assumption). The (Frank & Al-Ali 1975) paper we have refuted on a number of occasions (for example in 2005, and most recently in our BioRxiv paper) but you still take that as the correct relation between SNR and FRC (and you never cite the criticism...). Sorry Marin On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <pawel.a.penc...@uth.tmc.edu<mailto:pawel.a.penc...@uth.tmc.edu>> wrote: Dear Teige, I am wondering whether you are familiar with Resolution measures in molecular electron microscopy. Penczek PA. Methods Enzymol. 2010. Citation Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8. You will find there answers to all questions you asked and much more. Regards, Pawel Penczek Regards, Pawel _______________________________________________ 3dem mailing list 3...@ncmir.ucsd.edu<mailto:3...@ncmir.ucsd.edu> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem ________________________________ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 -- ------------------------------------------------------------------------ P.H. Zwart Staff Scientist Molecular Biophysics and Integrated Bioimaging & Center for Advanced Mathematics for Energy Research Applications Lawrence Berkeley National Laboratories 1 Cyclotron Road, Berkeley, CA-94703, USA Cell: 510 289 9246 PHENIX: http://www.phenix-online.org<http://www.phenix-online.org/> CAMERA: http://camera.lbl.gov/ ------------------------------------------------------------------------- ________________________________ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom ________________________________ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom ________________________________ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 ------ Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 The Keith Peters Building Fax: + 44 1223 336827 Hills Road E-mail: rj...@cam.ac.uk<mailto:rj...@cam.ac.uk> Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk<http://www-structmed.cimr.cam.ac.uk> ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1