The funny thing is, although we generally regard resolution as a primary
indicator of data quality the appearance of a density map at the classic
"1-sigma" contour has very little to do with resolution, and everything
to do with the B factor.
Seriously, try it. Take any structure you like, set all the B factors to
30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have a
look at the density of tyrosine (Tyr) side chains. Even if you
calculate structure factors all the way out to 1.0 A the holes in the
Tyr rings look exactly the same: just barely starting to form. This is
because the structure factors from atoms with B=30 are essentially zero
out at 1.0 A, and adding zeroes does not change the map. You can adjust
the contour level, of course, and solvent content will have some effect
on where the "1-sigma" contour lies, but generally B=30 is the point
where Tyr side chains start to form their holes. Traditionally, this is
attributed to 1.8A resolution, but it is really at B=30. The point
where waters first start to poke out above the 1-sigma contour is at
B=60, despite being generally attributed to d=2.7A.
Now, of course, if you cut off this B=30 data at 3.5A then the Tyr side
chains become blobs, but that is equivalent to collecting data with the
detector way too far away and losing your high-resolution spots off the
edges. I have seen a few people do that, but not usually for a
published structure. Most people fight very hard for those faint,
barely-existing high-angle spots. But why do we do that if the map is
going to look the same anyway? The reason is because resolution and B
factors are linked.
Resolution is about separation vs width, and the width of the density
peak from any atom is set by its B factor. Yes, atoms have an intrinsic
width, but it is very quickly washed out by even modest B factors (B >
10). This is true for both x-ray and electron form factors. To a very
good approximation, the FWHM of C, N and O atoms is given by:
FWHM= sqrt(B*log(2))/pi+0.15
where "B" is the B factor assigned to the atom and the 0.15 fudge factor
accounts for its intrinsic width when B=0. Now that we know the peak
width, we can start to ask if two peaks are "resolved".
Start with the classical definition of "resolution" (call it after Airy,
Raleigh, Dawes, or whatever famous person you like), but essentially you
are asking the question: "how close can two peaks be before they merge
into one peak?". For Gaussian peaks this is 0.849*FWHM. Simple enough.
However, when you look at the density of two atoms this far apart you
will see the peak is highly oblong. Yes, the density has one maximum,
but there are clearly two atoms in there. It is also pretty obvious the
long axis of the peak is the line between the two atoms, and if you fit
two round atoms into this peak you recover the distance between them
quite accurately. Are they really not "resolved" if it is so clear
where they are?
In such cases you usually want to sharpen, as that will make the oblong
blob turn into two resolved peaks. Sharpening reduces the B factor and
therefore FWHM of every atom, making the "resolution" (0.849*FWHM) a
shorter distance. So, we have improved resolution with sharpening! Why
don't we always do this? Well, the reason is because of noise.
Sharpening up-weights the noise of high-order Fourier terms and
therefore degrades the overall signal-to-noise (SNR) of the map. This
is what I believe Colin would call reduced "contrast". Of course, since
we view maps with a threshold (aka contour) a map with SNR=5 will look
almost identical to a map with SNR=500. The "noise floor" is generally
well below the 1-sigma threshold, or even the 0-sigma threshold
(https://doi.org/10.1073/pnas.1302823110). As you turn up the
sharpening you will see blobs split apart and also see new peaks rising
above your map contouring threshold. Are these new peaks real? Or are
they noise? That is the difference between SNR=500 and SNR=5,
respectively. The tricky part of sharpening is knowing when you have
reached the point where you are introducing more noise than signal.
There are some good methods out there, but none of them are perfect.
What about filtering out the noise? An ideal noise suppression filter
has the same shape as the signal (I found that in Numerical Recipes),
and the shape of the signal from a macromolecule is a Gaussian in
reciprocal space (aka straight line on a Wilson plot). This is true, by
the way, for both a molecule packed into a crystal or free in solution.
So, the ideal noise-suppression filter is simply applying a B factor.
Only problem is: sharpening is generally done by applying a negative B
factor, so applying a Gaussian blur is equivalent to just not sharpening
as much. So, we are back to "optimal sharpening" again.
Why not use a filter that is non-Gaussian? We do this all the time!
Cutting off the data at a given resolution (d) is equivalent to blurring
the map with this function:
kernel_d(r) = 4/3*pi/d**3*sinc3(2*pi*r/d)
sinc3(x) = (x==0?1:3*(sin(x)/x-cos(x))/(x*x))
where kernel_d(r) is the normalized weight given to a point "r" Angstrom
away from the center of each blurring operation, and "sinc3" is the
Fourier synthesis of a solid sphere. That is, if you make an HKL file
with all F=1 and PHI=0 out to a resolution d, then effectively all hkls
beyond the resolution limit are zero. If you calculate a map with those
Fs, you will find the kernel_d(r) function at the origin. What that
means is: by applying a resolution cutoff, you are effectively
multiplying your data by this sphere of unit Fs, and since a
multiplication in reciprocal space is a convolution in real space, the
effect is convoluting (blurring) with kernel_d(x).
For comparison, if you apply a B factor, the real-space blurring kernel
is this:
kernel_B(r) = (4*pi/B)**1.5*exp(-4*pi**2/B*r*r)
If you graph these two kernels (format is for gnuplot) you will find
that they have the same FWHM whenever B=80*(d/3)**2. This "rule" is the
one I used for my resolution demonstration movie I made back in the late
20th century:
https://bl831.als.lbl.gov/~jamesh/movies/index.html#resolution
What I did then was set all atomic B factors to B = 80*(d/3)^2 and then
cut the resolution at "d". Seemed sensible at the time. I suppose I
could have used the PDB-wide average atomic B factor reported for
structures with resolution "d", which roughly follows:
B = 4*d**2+12
https://bl831.als.lbl.gov/~jamesh/pickup/reso_vs_avgB.png
The reason I didn't use this formula for the movie is because I didn't
figure it out until about 10 years later. These two curves cross at
1.5A, but diverge significantly at poor resolution. So, which one is
right? It depends on how well you can measure really really faint
spots, and we've been getting better at that in recent decades.
So, what I'm trying to say here is that just because your data has CC1/2
or FSC dropping off to insignificance at 1.8 A doesn't mean you are
going to see holes in Tyr side chains. However, if you measure your
weak, high-res data really well (high multiplicity), you might be able
to sharpen your way to a much clearer map.
-James Holton
MAD Scientist
On 2/27/2020 11:01 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:
James
All you say seems sensible to me but there is the possibility of
confusion regarding the use of the word threshold. I fully agree that
a half bit information threshold is inappropriate if it is taken to
mean that the data should be truncated at that resolution. The ever
more sophisticated refinement programs are becoming adept at handling
the noisy data.
The half bit information threshold I was discussing refers to a
nominal resolution. This is not just for trivial reporting purposes.
The half bit threshold is being used to compare imaging methods and
perhaps demonstrate that significant information is present with a
dose below any radiation damage threshold (that word again). The
justification for doing this appears to come from the fact it has been
adopted for protein structure determination by single particle
electron microscopy. However, low contrast features might not be
visible at this nominal resolution.
The analogy with protein crystallography might be to collect data
below an absorption edge to give a nominal resolution of 2 angstrom.
Then do it again well above the absorption edge. The second one gives
much greater Bijvoet differences despite the fact that the nominal
resolution is the same. I doubt whether anyone doing this would be
misled by this as they would examine the statistics for the Bijvoet
differences instead. However, it does indicate the relationship
between contrast and resolution.
The question, if referring to an information threshold for nominal
resolution, could be “Is there significant information in the data at
the required contrast and resolution?”. Then “Can one obtain this
information at a dose below any radiation damage limit”
Keep posting!
Regards
Colin
*From:*James Holton <jmhol...@lbl.gov>
*Sent:* 27 February 2020 01:14
*To:* CCP4BB@JISCMAIL.AC.UK
*Cc:* Nave, Colin (DLSLtd,RAL,LSCI) <colin.n...@diamond.ac.uk>
*Subject:* Re: [ccp4bb] [3dem] Which resolution?
In my opinion the threshold should be zero bits. Yes, this is where
CC1/2 = 0 (or FSC = 0). If there is correlation then there is
information, and why throw out information if there is information to
be had? Yes, this information comes with noise attached, but that is
why we have weights.
It is also important to remember that zero intensity is still useful
information. Systematic absences are an excellent example. They have
no intensity at all, but they speak volumes about the structure. In a
similar way, high-angle zero-intensity observations also tell us
something. Ever tried unrestrained B factor refinement at poor
resolution? It is hard to do nowadays because of all the safety
catches in modern software, but you can get great R factors this way.
A telltale sign of this kind of "over fitting" is remarkably large
Fcalc values beyond the resolution cutoff. These don't contribute to
the R factor, however, because Fobs is missing for these hkls. So,
including zero-intensity data suppresses at least some types of
over-fitting.
The thing I like most about the zero-information resolution cutoff is
that it forces us to address the real problem: what do you mean by
"resolution" ? Not long ago, claiming your resolution was 3.0 A meant
that after discarding all spots with individual I/sigI < 3 you still
have 80% completeness in the 3.0 A bin. Now we are saying we have a
3.0 A data set when we can prove statistically that a few
non-background counts fell into the sum of all spot areas at 3.0 A.
These are not the same thing.
Don't get me wrong, including the weak high-resolution information
makes the model better, and indeed I am even advocating including all
the noisy zeroes. However, weak data at 3.0 A is never going to be as
good as having strong data at 3.0 A. So, how do we decide? I
personally think that the resolution assigned to the PDB deposition
should remain the classical I/sigI > 3 at 80% rule. This is really
the only way to have meaningful comparison of resolution between very
old and very new structures. One should, of course, deposit all the
data, but don't claim that cut-off as your "resolution". That is just
plain unfair to those who came before.
Oh yeah, and I also have a session on "interpreting low-resolution
maps" at the GRC this year.
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/
So, please, let the discussion continue!
-James Holton
MAD Scientist
On 2/22/2020 11:06 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:
Alexis
This is a very useful summary.
You say you were not convinced by Marin's derivation in 2005. Are
you convinced now and, if not, why?
My interest in this is that the FSC with half bit thresholds have
the danger of being adopted elsewhere because they are becoming
standard for protein structure determination (by EM or MX). If it
is used for these mature techniques it must be right!
It is the adoption of the ½ bit threshold I worry about. I gave a
rather weak example for MX which consisted of partial occupancy of
side chains, substrates etc. For x-ray imaging a wide range of
contrasts can occur and, if you want to see features with only a
small contrast above the surroundings then I think the half bit
threshold would be inappropriate.
It would be good to see a clear message from the MX and EM
communities as to why an information content threshold of ½ a bit
is generally appropriate for these techniques and an
acknowledgement that this threshold is technique/problem dependent.
We might then progress from the bronze age to the iron age.
Regards
Colin
*From:*CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK>
<mailto:CCP4BB@JISCMAIL.AC.UK> *On Behalf Of *Alexis Rohou
*Sent:* 21 February 2020 16:35
*To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
*Subject:* Re: [ccp4bb] [3dem] Which resolution?
Hi all,
For those bewildered by Marin's insistence that everyone's been
messing up their stats since the bronze age, I'd like to offer
what my understanding of the situation. More details in this
thread from a few years ago on the exact same topic:
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
Notwithstanding notational problems (e.g. strict equations as
opposed to approximation symbols, or omission of symbols to denote
estimation), I believe Frank & Al-Ali and "descendent" papers
(e.g. appendix of Rosenthal & Henderson 2003) are fine. The cross
terms that Marin is agitated about indeed do in fact have an
expectation value of 0.0 (in the ensemble; if the experiment were
performed an infinite number of times with different realizations
of noise). I don't believe Pawel or Jose Maria or any of the other
authors really believe that the cross-terms are orthogonal.
When N (the number of independent Fouier voxels in a shell) is
large enough, mean(Signal x Noise) ~ 0.0 is only an approximation,
but a pretty good one, even for a single FSC experiment. This is
why, in my book, derivations that depend on Frank & Al-Ali are OK,
under the strict assumption that N is large. Numerically, this
becomes apparent when Marin's half-bit criterion is plotted -
asymptotically it has the same behavior as a constant threshold.
So, is Marin wrong to worry about this? No, I don't think so.
There are indeed cases where the assumption of large N is broken.
And under those circumstances, any fixed threshold (0.143, 0.5,
whatever) is dangerous. This is illustrated in figures of van Heel
& Schatz (2005). Small boxes, high-symmetry, small objects in
large boxes, and a number of other conditions can make fixed
thresholds dangerous.
It would indeed be better to use a non-fixed threshold. So why am
I not using the 1/2-bit criterion in my own work? While
numerically it behaves well at most resolution ranges, I was not
convinced by Marin's derivation in 2005. Philosophically though, I
think he's right - we should aim for FSC thresholds that are more
robust to the kinds of edge cases mentioned above. It would be the
right thing to do.
Hope this helps,
Alexis
On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A
<pawel.a.penc...@uth.tmc.edu <mailto:pawel.a.penc...@uth.tmc.edu>>
wrote:
Marin,
The statistics in 2010 review is fine. You may disagree with
assumptions, but I can assure you the “statistics” (as you
call it) is fine. Careful reading of the paper would reveal to
you this much.
Regards,
Pawel
On Feb 16, 2020, at 10:38 AM, Marin van Heel
<marin.vanh...@googlemail.com
<mailto:marin.vanh...@googlemail.com>> wrote:
***** EXTERNAL EMAIL *****
Dear Pawel and All others ....
This 2010 review is - unfortunately - largely based on the
flawed statistics I mentioned before, namely on the a
priori assumption that the inner product of a signal
vector and a noise vector are ZERO (an orthogonality
assumption). The (Frank & Al-Ali 1975) paper we have
refuted on a number of occasions (for example in 2005, and
most recently in our BioRxiv paper) but you still take
that as the correct relation between SNR and FRC (and you
never cite the criticism...).
Sorry
Marin
On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A
<pawel.a.penc...@uth.tmc.edu
<mailto:pawel.a.penc...@uth.tmc.edu>> wrote:
Dear Teige,
I am wondering whether you are familiar with
Resolution measures in molecular electron microscopy.
Penczek PA. Methods Enzymol. 2010.
Citation
Methods Enzymol. 2010;482:73-100. doi:
10.1016/S0076-6879(10)82003-8.
You will find there answers to all questions you asked
and much more.
Regards,
Pawel Penczek
Regards,
Pawel
_______________________________________________
3dem mailing list
3...@ncmir.ucsd.edu <mailto:3...@ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>
_______________________________________________
3dem mailing list
3...@ncmir.ucsd.edu <mailto:3...@ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
--
This e-mail and any attachments may contain confidential,
copyright and or privileged material, and are for the use of the
intended addressee only. If you are not the intended addressee or
an authorised recipient of the addressee please notify us of
receipt by returning the e-mail and do not use, copy, retain,
distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the
individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability
for any damage which you may sustain as a result of software
viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in
England and Wales with its registered office at Diamond House,
Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
0DE, United Kingdom
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
--
This e-mail and any attachments may contain confidential, copyright
and or privileged material, and are for the use of the intended
addressee only. If you are not the intended addressee or an authorised
recipient of the addressee please notify us of receipt by returning
the e-mail and do not use, copy, retain, distribute or disclose the
information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual
and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability for
any damage which you may sustain as a result of software viruses which
may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in
England and Wales with its registered office at Diamond House, Harwell
Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United
Kingdom
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1