Re: [ccp4bb] [3dem] Which resolution?

James Holton Wed, 04 Mar 2020 16:12:25 -0800

The funny thing is, although we generally regard resolution as a primaryindicator of data quality the appearance of a density map at the classic"1-sigma" contour has very little to do with resolution, and everythingto do with the B factor.

Seriously, try it. Take any structure you like, set all the B factors to30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have alook at the density of tyrosine (Tyr) side chains. Even if youcalculate structure factors all the way out to 1.0 A the holes in theTyr rings look exactly the same: just barely starting to form. This isbecause the structure factors from atoms with B=30 are essentially zeroout at 1.0 A, and adding zeroes does not change the map. You can adjustthe contour level, of course, and solvent content will have some effecton where the "1-sigma" contour lies, but generally B=30 is the pointwhere Tyr side chains start to form their holes. Traditionally, this isattributed to 1.8A resolution, but it is really at B=30. The pointwhere waters first start to poke out above the 1-sigma contour is atB=60, despite being generally attributed to d=2.7A.

Now, of course, if you cut off this B=30 data at 3.5A then the Tyr sidechains become blobs, but that is equivalent to collecting data with thedetector way too far away and losing your high-resolution spots off theedges. I have seen a few people do that, but not usually for apublished structure. Most people fight very hard for those faint,barely-existing high-angle spots. But why do we do that if the map isgoing to look the same anyway? The reason is because resolution and Bfactors are linked.

Resolution is about separation vs width, and the width of the densitypeak from any atom is set by its B factor. Yes, atoms have an intrinsicwidth, but it is very quickly washed out by even modest B factors (B >10). This is true for both x-ray and electron form factors. To a verygood approximation, the FWHM of C, N and O atoms is given by:

FWHM= sqrt(B*log(2))/pi+0.15

where "B" is the B factor assigned to the atom and the 0.15 fudge factoraccounts for its intrinsic width when B=0. Now that we know the peakwidth, we can start to ask if two peaks are "resolved".

Start with the classical definition of "resolution" (call it after Airy,Raleigh, Dawes, or whatever famous person you like), but essentially youare asking the question: "how close can two peaks be before they mergeinto one peak?". For Gaussian peaks this is 0.849*FWHM. Simple enough.However, when you look at the density of two atoms this far apart youwill see the peak is highly oblong. Yes, the density has one maximum,but there are clearly two atoms in there. It is also pretty obvious thelong axis of the peak is the line between the two atoms, and if you fittwo round atoms into this peak you recover the distance between themquite accurately. Are they really not "resolved" if it is so clearwhere they are?

In such cases you usually want to sharpen, as that will make the oblongblob turn into two resolved peaks. Sharpening reduces the B factor andtherefore FWHM of every atom, making the "resolution" (0.849*FWHM) ashorter distance. So, we have improved resolution with sharpening! Whydon't we always do this? Well, the reason is because of noise. Sharpening up-weights the noise of high-order Fourier terms andtherefore degrades the overall signal-to-noise (SNR) of the map. Thisis what I believe Colin would call reduced "contrast". Of course, sincewe view maps with a threshold (aka contour) a map with SNR=5 will lookalmost identical to a map with SNR=500. The "noise floor" is generallywell below the 1-sigma threshold, or even the 0-sigma threshold(https://doi.org/10.1073/pnas.1302823110). As you turn up thesharpening you will see blobs split apart and also see new peaks risingabove your map contouring threshold. Are these new peaks real? Or arethey noise? That is the difference between SNR=500 and SNR=5,respectively. The tricky part of sharpening is knowing when you havereached the point where you are introducing more noise than signal. There are some good methods out there, but none of them are perfect.

What about filtering out the noise? An ideal noise suppression filterhas the same shape as the signal (I found that in Numerical Recipes),and the shape of the signal from a macromolecule is a Gaussian inreciprocal space (aka straight line on a Wilson plot). This is true, bythe way, for both a molecule packed into a crystal or free in solution. So, the ideal noise-suppression filter is simply applying a B factor. Only problem is: sharpening is generally done by applying a negative Bfactor, so applying a Gaussian blur is equivalent to just not sharpeningas much. So, we are back to "optimal sharpening" again.

Why not use a filter that is non-Gaussian? We do this all the time! Cutting off the data at a given resolution (d) is equivalent to blurringthe map with this function:


kernel_d(r) = 4/3*pi/d**3*sinc3(2*pi*r/d)
sinc3(x) = (x==0?1:3*(sin(x)/x-cos(x))/(x*x))

where kernel_d(r) is the normalized weight given to a point "r" Angstromaway from the center of each blurring operation, and "sinc3" is theFourier synthesis of a solid sphere. That is, if you make an HKL filewith all F=1 and PHI=0 out to a resolution d, then effectively all hklsbeyond the resolution limit are zero. If you calculate a map with thoseFs, you will find the kernel_d(r) function at the origin. What thatmeans is: by applying a resolution cutoff, you are effectivelymultiplying your data by this sphere of unit Fs, and since amultiplication in reciprocal space is a convolution in real space, theeffect is convoluting (blurring) with kernel_d(x).

For comparison, if you apply a B factor, the real-space blurring kernelis this:

kernel_B(r) = (4*pi/B)**1.5*exp(-4*pi**2/B*r*r)

If you graph these two kernels (format is for gnuplot) you will findthat they have the same FWHM whenever B=80*(d/3)**2. This "rule" is theone I used for my resolution demonstration movie I made back in the late20th century:

https://bl831.als.lbl.gov/~jamesh/movies/index.html#resolution

What I did then was set all atomic B factors to B = 80*(d/3)^2 and thencut the resolution at "d". Seemed sensible at the time. I suppose Icould have used the PDB-wide average atomic B factor reported forstructures with resolution "d", which roughly follows:

B = 4*d**2+12
https://bl831.als.lbl.gov/~jamesh/pickup/reso_vs_avgB.png

The reason I didn't use this formula for the movie is because I didn'tfigure it out until about 10 years later. These two curves cross at1.5A, but diverge significantly at poor resolution. So, which one isright? It depends on how well you can measure really really faintspots, and we've been getting better at that in recent decades.

So, what I'm trying to say here is that just because your data has CC1/2or FSC dropping off to insignificance at 1.8 A doesn't mean you aregoing to see holes in Tyr side chains. However, if you measure yourweak, high-res data really well (high multiplicity), you might be ableto sharpen your way to a much clearer map.


-James Holton
MAD Scientist

On 2/27/2020 11:01 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:

James
All you say seems sensible to me but there is the possibility ofconfusion regarding the use of the word threshold. I fully agree thata half bit information threshold is inappropriate if it is taken tomean that the data should be truncated at that resolution. The evermore sophisticated refinement programs are becoming adept at handlingthe noisy data.
The half bit information threshold I was discussing refers to anominal resolution. This is not just for trivial reporting purposes.The half bit threshold is being used to compare imaging methods andperhaps demonstrate that significant information is present with adose below any radiation damage threshold (that word again). Thejustification for doing this appears to come from the fact it has beenadopted for protein structure determination by single particleelectron microscopy. However, low contrast features might not bevisible at this nominal resolution.
The analogy with protein crystallography might be to collect databelow an absorption edge to give a nominal resolution of 2 angstrom.Then do it again well above the absorption edge. The second one givesmuch greater Bijvoet differences despite the fact that the nominalresolution is the same. I doubt whether anyone doing this would bemisled by this as they would examine the statistics for the Bijvoetdifferences instead. However, it does indicate the relationshipbetween contrast and resolution.
The question, if referring to an information threshold for nominalresolution, could be “Is there significant information in the data atthe required contrast and resolution?”. Then “Can one obtain thisinformation at a dose below any radiation damage limit”
Keep posting!

Regards

Colin

*From:*James Holton <jmhol...@lbl.gov>
*Sent:* 27 February 2020 01:14
*To:* CCP4BB@JISCMAIL.AC.UK
*Cc:* Nave, Colin (DLSLtd,RAL,LSCI) <colin.n...@diamond.ac.uk>
*Subject:* Re: [ccp4bb] [3dem] Which resolution?
In my opinion the threshold should be zero bits. Yes, this is whereCC1/2 = 0 (or FSC = 0). If there is correlation then there isinformation, and why throw out information if there is information tobe had? Yes, this information comes with noise attached, but that iswhy we have weights.
It is also important to remember that zero intensity is still usefulinformation. Systematic absences are an excellent example. They haveno intensity at all, but they speak volumes about the structure. In asimilar way, high-angle zero-intensity observations also tell ussomething. Ever tried unrestrained B factor refinement at poorresolution? It is hard to do nowadays because of all the safetycatches in modern software, but you can get great R factors this way. A telltale sign of this kind of "over fitting" is remarkably largeFcalc values beyond the resolution cutoff. These don't contribute tothe R factor, however, because Fobs is missing for these hkls. So,including zero-intensity data suppresses at least some types ofover-fitting.
The thing I like most about the zero-information resolution cutoff isthat it forces us to address the real problem: what do you mean by"resolution" ? Not long ago, claiming your resolution was 3.0 A meantthat after discarding all spots with individual I/sigI < 3 you stillhave 80% completeness in the 3.0 A bin. Now we are saying we have a3.0 A data set when we can prove statistically that a fewnon-background counts fell into the sum of all spot areas at 3.0 A. These are not the same thing.
Don't get me wrong, including the weak high-resolution informationmakes the model better, and indeed I am even advocating including allthe noisy zeroes. However, weak data at 3.0 A is never going to be asgood as having strong data at 3.0 A. So, how do we decide? Ipersonally think that the resolution assigned to the PDB depositionshould remain the classical I/sigI > 3 at 80% rule. This is reallythe only way to have meaningful comparison of resolution between veryold and very new structures. One should, of course, deposit all thedata, but don't claim that cut-off as your "resolution". That is justplain unfair to those who came before.
Oh yeah, and I also have a session on "interpreting low-resolutionmaps" at the GRC this year.https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/
So, please, let the discussion continue!

-James Holton
MAD Scientist

On 2/22/2020 11:06 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:

    Alexis

    This is a very useful summary.

    You say you were not convinced by Marin's derivation in 2005. Are
    you convinced now and, if not, why?

    My interest in this is that the FSC with half bit thresholds have
    the danger of being adopted elsewhere because they are becoming
    standard for protein structure determination (by EM or MX). If it
    is used for these mature techniques it must be right!

    It is the adoption of the ½ bit threshold I worry about. I gave a
    rather weak example for MX which consisted of partial occupancy of
    side chains, substrates etc. For x-ray imaging a wide range of
    contrasts can occur and, if you want to see features with only a
    small contrast above the surroundings then I think the half bit
    threshold would be inappropriate.

    It would be good to see a clear message from the MX and EM
    communities as to why an information content threshold of ½ a bit
    is generally appropriate for these techniques and an
    acknowledgement that this threshold is technique/problem dependent.

    We might then progress from the bronze age to the iron age.

    Regards

    Colin

    *From:*CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK>
    <mailto:CCP4BB@JISCMAIL.AC.UK> *On Behalf Of *Alexis Rohou
    *Sent:* 21 February 2020 16:35
    *To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
    *Subject:* Re: [ccp4bb] [3dem] Which resolution?

    Hi all,

    For those bewildered by Marin's insistence that everyone's been
    messing up their stats since the bronze age, I'd like to offer
    what my understanding of the situation. More details in this
    thread from a few years ago on the exact same topic:

    https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html

    https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html

    Notwithstanding notational problems (e.g. strict equations as
    opposed to approximation symbols, or omission of symbols to denote
    estimation), I believe Frank & Al-Ali and "descendent" papers
    (e.g. appendix of Rosenthal & Henderson 2003) are fine. The cross
    terms that Marin is agitated about indeed do in fact have an
    expectation value of 0.0 (in the ensemble; if the experiment were
    performed an infinite number of times with different realizations
    of noise). I don't believe Pawel or Jose Maria or any of the other
    authors really believe that the cross-terms are orthogonal.

    When N (the number of independent Fouier voxels in a shell) is
    large enough, mean(Signal x Noise) ~ 0.0 is only an approximation,
    but a pretty good one, even for a single FSC experiment. This is
    why, in my book, derivations that depend on Frank & Al-Ali are OK,
    under the strict assumption that N is large. Numerically, this
    becomes apparent when Marin's half-bit criterion is plotted -
    asymptotically it has the same behavior as a constant threshold.

    So, is Marin wrong to worry about this? No, I don't think so.
    There are indeed cases where the assumption of large N is broken.
    And under those circumstances, any fixed threshold (0.143, 0.5,
    whatever) is dangerous. This is illustrated in figures of van Heel
    & Schatz (2005). Small boxes, high-symmetry, small objects in
    large boxes, and a number of other conditions can make fixed
    thresholds dangerous.

    It would indeed be better to use a non-fixed threshold. So why am
    I not using the 1/2-bit criterion in my own work? While
    numerically it behaves well at most resolution ranges, I was not
    convinced by Marin's derivation in 2005. Philosophically though, I
    think he's right - we should aim for FSC thresholds that are more
    robust to the kinds of edge cases mentioned above. It would be the
    right thing to do.

    Hope this helps,

    Alexis

    On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A
    <pawel.a.penc...@uth.tmc.edu <mailto:pawel.a.penc...@uth.tmc.edu>>
    wrote:

        Marin,

        The statistics in 2010 review is fine. You may disagree with
        assumptions, but I can assure you the “statistics” (as you
        call it) is fine. Careful reading of the paper would reveal to
        you this much.

        Regards,

        Pawel




            On Feb 16, 2020, at 10:38 AM, Marin van Heel
            <marin.vanh...@googlemail.com
            <mailto:marin.vanh...@googlemail.com>> wrote:

            

            ***** EXTERNAL EMAIL *****

            Dear Pawel and All others ....

            This 2010 review is - unfortunately - largely based on the
            flawed statistics I mentioned before, namely on the a
            priori assumption that the inner product of a signal
            vector and a noise vector are ZERO (an orthogonality
            assumption).  The (Frank & Al-Ali 1975) paper we have
            refuted on a number of occasions (for example in 2005, and
            most recently in our BioRxiv paper) but you still take
            that as the correct relation between SNR and FRC (and you
            never cite the criticism...).

            Sorry

            Marin

            On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A
            <pawel.a.penc...@uth.tmc.edu
            <mailto:pawel.a.penc...@uth.tmc.edu>> wrote:

                Dear Teige,

                I am wondering whether you are familiar with


                    Resolution measures in molecular electron microscopy.

                Penczek PA. Methods Enzymol. 2010.


                      Citation

                Methods Enzymol. 2010;482:73-100. doi:
                10.1016/S0076-6879(10)82003-8.

                You will find there answers to all questions you asked
                and much more.

                Regards,

                Pawel Penczek

                Regards,

                Pawel

                _______________________________________________
                3dem mailing list
                3...@ncmir.ucsd.edu <mailto:3...@ncmir.ucsd.edu>
                https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
                
<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>

        _______________________________________________
        3dem mailing list
        3...@ncmir.ucsd.edu <mailto:3...@ncmir.ucsd.edu>
        https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem

    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
--
    This e-mail and any attachments may contain confidential,
    copyright and or privileged material, and are for the use of the
    intended addressee only. If you are not the intended addressee or
    an authorised recipient of the addressee please notify us of
    receipt by returning the e-mail and do not use, copy, retain,
    distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the
    individual and not necessarily of Diamond Light Source Ltd.
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any
    attachments are free from viruses and we cannot accept liability
    for any damage which you may sustain as a result of software
    viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in
    England and Wales with its registered office at Diamond House,
    Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
    0DE, United Kingdom

    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

--
This e-mail and any attachments may contain confidential, copyrightand or privileged material, and are for the use of the intendedaddressee only. If you are not the intended addressee or an authorisedrecipient of the addressee please notify us of receipt by returningthe e-mail and do not use, copy, retain, distribute or disclose theinformation in or attached to the e-mail.Any opinions expressed within this e-mail are those of the individualand not necessarily of Diamond Light Source Ltd.Diamond Light Source Ltd. cannot guarantee that this e-mail or anyattachments are free from viruses and we cannot accept liability forany damage which you may sustain as a result of software viruses whichmay be transmitted in or with the message.Diamond Light Source Limited (company no. 4375679). Registered inEngland and Wales with its registered office at Diamond House, HarwellScience and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, UnitedKingdom



########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Re: [ccp4bb] [3dem] Which resolution?

Reply via email to