Re: [ccp4bb] sigma cutoff for fitting waters in model

Dale Tronrud Thu, 22 Apr 2010 09:46:53 -0700

   Yes, there has been a conflation of the standard deviation and the
r.m.s. of the distribution when it comes to "sigmas".  The mathematical
formulas look similar (for a Normal distribution) so some people have
sloppily transferred the meanings of the mathematical symbols from one
concept to the other.

   There is another matter in this topic that has also bothered me.
People talk about the number of sigmas high a peak is (as in number
of r.m.s.'s) when making some argument about the probability of a
peak being that high.  The problem is that the r.m.s. is calculated
from individual samples of the map at grid points but the conclusion
is related to "peaks".  If you want to comment on the probability of
a peak having a height of so much or larger, you have to work with
the distribution of peak heights (the values on the shoulders of
the peaks being irrelevant to the topic.).  You need to identify
all the peaks in the map and work from the distribution of their
heights.

   In a 2Fo-Fc style map there is a bimodal distribution with a large
number of small peaks in the bulk solvent and a bunch of strong peaks
in the region of the ordered molecules.  While the r.m.s. of the bulk
solvent region might give a reasonable estimate of the sigma (as in
the uncertainty in peak heights) of this map, the one r.m.s. cutoff
for interpreting the map is simply a tool to try to find the line
separating the two distributions so that the big peaks will be inside
the contours and the small peaks will be outside.

   As was stated previously in this thread, when there is a greater
proportion of bulk solvent in the crystal the small peaks contribute
more to the r.m.s. calculation and the big peaks, of the same
significance, will appear to ride higher above the one "sigma" contour.
A 1.2 r.m.s. peak in a map with 80% solvent should be considered less
likely to be an atom than a 1.2 r.m.s. peak in a map with 40% solvent.

   When searching for water you have a nearly complete atomic model
and can use that to put your map on an absolute scale of "electron
scattering equivalents"/A^3 (OMG we're not back to that again ;-) )
and avoid this whole sliding scale problem.

Dale Tronrud

On 04/21/10 17:21, James Holton wrote:
> Like so many rules of thumb, the 3-sigma fofc and 1-sigma 2fofc is a
> reasonable guideline that works very well in most cases despite being
> based on a flawed assumption.  The "0.3% chance" of a peak being above 3
> "sigmas" assumes that the histogram of electron density values is
> Gaussian.  It is not!  In fact, it is a funny-looking bimodal
> distribution (the peaks are protein and solvent regions).  Programs like
> SOLVE use this fact to identify the correct heavy-atom constellation
> among all the wrong ones (which tend to produce maps with more
> Gaussian-looking histograms).
> 
> It also seems to be a very common misconception that "1 sigma" is the
> "noise level" in an electron density map.  Not sure where that one got
> started or how.  No doubt due to the unfortunate use of the greek letter
> "sigma" to denote a standard deviation in statistics.  Indeed, the
> "sigma" scaling of an electron density map is calculated the same way as
> a standard deviation, but one need only calculate a "noise free" map
> from a PDB file to notice that the "sigma" of such maps is not zero.
> 
> Does anyone know original references for sigma cutoff rules like this?
> 
> -James Holton
> MAD Scientist
> 
> Ed Pozharski wrote:
>> I second Tim's opinion.  In the days of CNS/O, there was a popular rule
>> to place waters in 3 sigma peaks that make chemical sense, then
>> re-refine and keep those waters that produce more than 1 sigma in 2fo-fc
>> map.  (With Coot the default cutoff is 5).
>>
>> There could be a bizarre probabilistic argument for a particular choice
>> of sigma cutoff - with 3 sigmas you have ~0.3% chance of a particular
>> peak to be simply a random spike.  Which means that if the map is on,
>> say, 0.5A grid, there is a decent chance to have one such peak per
>> 3.5x3.5x3.5A volume.  With 5 sigmas the size of the cube goes up to
>> ~60x60x60A, so 5 sigma peaks are almost guaranteed not to be flukes.
>>
>> On Sat, 2010-04-17 at 22:46 +0200, Tim Gruene wrote:
>>  
>>> Hello Sudhir Kumar,
>>>
>>> most of all the waters in your structure should make chemical sense.
>>> When the
>>> density around the water is weak it may just mean that the water is
>>> not fully
>>> occupied.
>>>
>>> Tim
>>>
>>> On Sat, Apr 17, 2010 at 09:47:35PM +0900, Sudhir Kumar wrote:
>>>    
>>>> hi all
>>>> sorry for such a basic query, i'ld like to know what is the
>>>> acceptable sigma
>>>> cut off for waters to be kept in a model if data is of about 1.6 A.
>>>> thanks in advance
>>>> Sudhir Kumar
>>>> Research Scholar
>>>> Structural Biology Laboratory
>>>> SLS, JNU,
>>>> New Delhi-110067
>>>>       
>>
>>
>>

Re: [ccp4bb] sigma cutoff for fitting waters in model

Reply via email to