Re: [ccp4bb] question about SIGF

James Holton Sat, 20 Aug 2011 02:17:43 -0700

There is a formula for sigma(F) (aka "SIGF"), but it is actually acommon misconception that it is simply related to F. You need to know afew other things about the experiment that was done to collect thedata. The misconception seems to arise because the fist thing textbookstell you is that F = sqrt(I), where "I" is the intensity of the spot.Then, later on, they tell you that sigma(I) = sqrt(I) because of"counting statistics". Now, if you look up a table of error-propagationformulas, you will find that if I=F^2, then sigma(I)/I = 2*sigma(F)/F,and by substituting these equations together you readily obtain:


sigma(F) = F/2*sigma(I)/I
sigma(F) = F/2*sigma(I)/F^2
sigma(F) = sigma(I)/(2*F)
sigma(F) = sigma(I)/(2*sqrt(I))
sigma(F) = sqrt(I)/(2*sqrt(I))
sigma(F) = 0.5

Which says that the error in F is always the same, no matter what yourexposure time? Hmm.

The critical thing missing from the equations above is something wecrystallographers call a "scale factor". We love scale factors becausethey let us get away with not knowing a great many things, like thevolume of the crystal, the absolute intensity of the x-ray beam, and theexact "gain" of the detector. It's not that we can't measure or look upthese things, but few of us have the time. And, by and large, as longas you are aware that there is always an unknown "scale factor", itdoesn't really get in your way. So, the real equation is:


I_in_photons = scale*F^2

where scale =Ibeam*re^2*Vxtal*lambda^3*Loentz_factor*Polar_factor*Attenuation_factor*exposure_time/deltaphi/Vcell^2


This "scale factor" comes from Equation 1 in the following paper:
http://dx.doi.org/10.1107/S0907444910007262

where we took pains to describe the exact meaning of each of thesevariables (and their units!) in great detail. It is open access, so Iwon't go through them here. I will, however, add that for spots on thedetector there are a few other factors still missing, like the detectorgain, obliquity, parallax, and spot partiality, but these are all "takencare of" by the data processing program. The main thing is to figureout the number of photons that were accumulated for a given h,k,l index,and then take the square root of that to get the "counting error". Oh,and you also need to know the number of "background" photons that fellinto the pixels used to add up photons for the h,k,l of interest. Thesquare root of this count must be combined with the "counting error" ofthe spot photons, along with a few other sources of error. This is whatwe discuss around Equation (18) in the linked-to paper above.

The short answer, however, is that sqrt(I_in_photons) is only onecomponent of sigma(I). The other factors fall into three maincategories: readout noise, counting noise and what I call "fractionalnoise". Now, if you have a number of different sources of noise, youget the total noise by adding up the squares of all the components, andthen taking the square root:sigma(I_in_photons) = sqrt( I_in_photons + background_photons +sigma_readout^2 + frac_error*I_in_photons^2 )

For those of you who use SCALA and think the sqrt( sigI^2 + B*I +sdadd*I^2 ) form of this equation looks a lot like the SDCORRectionline, good job! That is a very perceptive observation.

What separates the three kinds of noise is how they relate to theexposure time. For example, readout noise is always the same, no matterwhat the exposure time is, but as you increase the exposure time, thenumber of photons in the spots and the background go up proportionally.This means that the contribution of "counting noise" to sigma(I)increases as the square root of the exposure time. On modern detectors,the read-out noise is equivalent to the "counting noise" of a few (oreven zero) photons/pixel, and so as soon as you have more than about 10photon/pixel of background, the readout noise is no longer significant.

So, in general, noise increases with the square root of exposuretime, but the signal (I_in_photons) increases in direct proportion toexposure time, so the signal-to-noise ratio (from counting noise alone)goes up with the square root of exposure time. That is, until you hitthe third type of noise: fractional noise. There are many sources offractional noise: shutter timing error, crystal vibration, fliker in theincident beam intensity, inaccurate scaling factors (including theabsorption correction), and variations in detector sensitivity acrossits face. Essentially, these amount to the error bars on all the termsin the "scale" formula above. On a good diffractometer, all theseerrors are small, usually less than a few percent, but one thing theyall have in common is their contribution to sigma(I) is proportional tothe the signal (I_in_photons), not the square root of it! This is thereason why the Rmerge of bright, low-angle spots never gets down to thethe 0.1% you would expect from counting a million photons, even if youdid count that many. It is also the reason why "SDadd" in SCALA (the"estimated error" in scalepack) tends to be around 3-5%. It is also thereason why measuring anomalous differences smaller than ~3% is so difficult.

So, if you know the magnitude of all these sources of error, then itshould be possible to derive a formula for SIGF, but I don't thinkanyone has quite written down the whole thing yet. Possibly because thedifference between Fobs and Fcalc is about 4-5x bigger than SIGF anyway.


-James Holton
MAD Scientist

On 8/18/2011 8:19 AM, G Y wrote:

Dear all,
I am a student in crystallography. So not quite familiar with someeven basic concepts.In shelx .hkl file or ccp4 .mtz file there is a column SIGF which isrelated to standard deviation of the structure factor. I read throughmany text book for crystallography, there are many formulas about thistopic. Sometimes it is a square of sigma, sometimes it is not.
My question is:
1. What is the exact mathematical formula for SIGF or SIGFP in ccp4 orshelx format file?
2. If it can be calculated from F, why it is necessary to include itin ccp4 or shelx reflection file (they have F already) ?
3. Is this value really important in structure determination? Why andhow? As I understood, during data collection each reflection measuredseveral times, so there is a deviation from the average F. That is themeaning of SIGF. But how to use this value in structure determination?Is there some kind of correction or refinement on F according to SIGF?
And also when multiplicity during data collection is low, the SIGFwould not be so interested. So is that means the SIGF would not be soimportant in some measurements?
Any kind reply from you guys would be greatly appreciated. Many thanks!

Best regards,
G

Re: [ccp4bb] question about SIGF

Reply via email to