Francis E Reyes wrote:
Sorry a late comer to this thread but the OP mentioned "tweaking the error model" in HKL2000. I have heard this before.What's the validity in this? Does it actually help or does it only help the integration numbers but you'll pay for it during refinement?
FR

There is no validity to "tweaking the error model"! It is a HORRIBLE idea! Unless, of course, you are trying to figure out where you made a "mistake".

It is one thing to TRY increasing the error bars to see how big your "unknown systematic error" is, but you should never just sally forth after doing that. This is equivalent to taking the data points:
110 +/- 5
100 +/- 5
90 +/- 5
50 +/- 5
45 +/- 5
and using an "error scale factor" of 5.3 to change all the error bars to 26.5 and make your "merged" result 79 +/- 13. The "Chi^2" here is 1.0 because the new "sigma" of each point is equal to the rms scatter in the observations. However, the way you arrived at this Chi^2=1 does not make any sense! Why would the individual error bars be so ridiculously underestimated? Looking at the points, this is obviously a bimodal distribution (i.e. over-merging).

I once saw one poor sod turn a 7-fold NCS molecule into a 2-fold crystallographic operator by simply rejecting "outlier" spots. Fortunately, they caught it eventually. But I fear there are more than a few wrong structures in the PDB because of these ill-advised practices: rejecting tons of perfectly good data, or using a completely non-physical "error model". Remember, by inflating your error bars you can make any number of wrong conclusions agree with your data to "within the error bars".

In tzhou's case, I think the "wrong conclusion" was assuming no radiation damage. Effectively, the crystal at the end of the data set was different from the crystal at the beginning. Sounds like even the unit cell dimensions changed significantly! I hope it is not surprising to anyone that refining a single-conformer atomic model against a moving average of the damaged and undamaged structures will get your R-factors "stuck".


I suppose it is appropriate here to mention the meaning of the parameters in the "error model". In denzo/HKL2K, these are the "error scale factor" value and "estimated error" table. In SCALA, the "error model" is given on the "SDCORRection" line, which has 3 numbers: sdfac (equivalent to the "error scale factor"), SdB (no equivalent in denzo), and sdadd (equivalent to the "estimated error"). In XSCALE, the "WEIGHT" keyword defines the equivalent of sdfac/"error scale factor". How these numbers are applied to the data is described fairly well in the various program's manuals, but the noise sources they account for do not seem to be widely known:

sdadd > 0 reflects the sum of all sources of fractional noise, or "% error". If the x-ray beam is flickering, the shutter timing is not perfect, the crystal vibrating in the cryo stream, or the detector calibration (a scale factor on each pixel) is not perfect, then you will get a new source of error that is proportional to intensity. Essentially, sdadd represents the combined error of all the "scale factors" in the experiment. In typical experiments, this all sums to about 3%/spot (sdadd = 0.03), which is why I/sd tends to top out at ~30 for datasets with modest multiplicity (see Diederichs, Acta D, 2010).

Now would be a good time to ask yourself why a particular resolution bin would have an "estimated error" different from the rest?

sdfac != 1.0 generally arises from using an incorrect detector gain (a nearly universal practice). The fact that almost every detector has a non-zero point-spread function (PSF) tends to make the rms variation in a field of identically illuminated pixels (flat field) less than what one would expect from "photon-counting noise". This is because the "true noise" is being "blurred" by the PSF. The reduced noise can fool one into thinking that the detector experienced more photons than it actually did (making the signal-to-noise ratio higher), leading to the conclusion that the detector gain (ADU/photon) is lower than it actually is. This is a fine assumption for getting the signal-to-noise of the background, but unfortunately, this "noise suppression" does not apply to spots. There is some debate about which "gain" is "right" to use in data processing but in my experience correcting things after-the-fact in scaling using sdfac is not harmful. Unless, of course, you inflate sdfac beyond what is required to correct the gain!

sdB != 0 introduces a factor proportional to the square root of intensity (proportional to photon counting noise), which can also arise from an incorrect detector gain. In practice, this term tends to "soak up" problems in the scaling of the other two. However, in my "optimization" of these error-model parameters I have found that sdB "refines" to 0 if I have used the correct detector gain.

It is worth noting here that no "error model" implementations seem to have a factor for handling noise sources that are independent of intensity, such as the detector read-out noise. However, one can "fudge" this by lowering the "zero" level in data processing (ADCOFFSET in MOSFLM). Essentially, you are fooling MOSFLM into thinking that there is a constant, flat "background" of "extra photons" in every pixel and equating the read-out noise to the noise you would get from this "extra" background. For example, and ADSC Q315r in hardware bin mode has a "true" GAIN of 1.83 ADU/(1 A photon) and a "true" ADCOFFSET of 40 ADU. However, the read-out noise tends to give rms 3 ADU on a blank image, which is equivalent to the noise deposited by ~3 photons/pixel. Therefore, lowering the ADCOFFSET to 37 will "simulate" the read-out noise. However, for other readout modes and other detector types this "fudge" will be different. Interestingly, the current default ADCOFFSET in MOSFLM for ADSC detectors is 8, which would be appropriate for a Quantum 4.

For all practical purposes, however, the read-out noise from a modern detector is almost always lost in the background, which dominates the per-pixel error in all but the most exotic cases (weak spots and near-zero background).

Anyway, my summary recommendation is to "get to know" your detector and what kind of "error model" you usually get from it. If you find that you have to "inflate your sigmas", then something is wrong.

-James Holton
MAD Scientist


On Jun 23, 2010, at 8:25 AM, "Zhou, Tongqing (NIH/VRC) [E]" <tz...@mail.nih.gov> wrote:

Hi All,

The problem has also been solved with a new 2.0A dataset collected over the last weekend. Same space group and dimensions, much less radiation damage. This time I used APS SER-CAT's weaker BM beamline.

Thanks,


Tongqing

Tongqing Zhou, Ph.D.
Staff Scientist
Structural Biology Section
Vaccine Research Center, NIAID/NIH
Building 40, Room 4607B
40 Convent Drive, MSC3027
Bethesda, MD 20892
(301) 594-8710 (Tel)
(301) 793-0794 (Cell)
(301) 480-2658 (Fax)
******************************************************************
The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives.
******************************************************************


-----Original Message-----
From: Zhou, Tongqing (NIH/VRC) [E]
Sent: Tuesday, June 15, 2010 10:45 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Stuck refinement

Hi, Everyone,

Thank you all very much for the nice suggestions. I am trying to reply within this email.

I agree that the problem may be rooted from the crystal itself, we noticed during data collection that a wedge of the rotation was very mosaic, HKL2000 was able to pick up the right spots, but the scaling gives high chi^2, and when I used the rejection files, HKL2000 complained "more than 50000 rejections". Colleagues suggested tweaking the error model, the complaint of "more than 50000 rejections' went away and rejection dropped to below 300 spots. The new error model reduced the chi^2 as well as the I/sigI in the low resolution shells.

I run the P222 data set with Xtriage, the report says no twining, but the symmetry was too low. However, HKL2000 won't even pick up higher symmetry groups during indexing. I also rescaled the data omitting the bad wedge, xtriage gives "normal" report.


Refinement was done with combination of simulated annealing, TLS, ADP, individual sites in Phenix. The molecular replacement was done with CDR-loop-trimmed antibody Fab and antigen structures. The map quality was good and I was able to rebuild the new loops without any problem.

I will have beam time later this week, I think it will be better to put a better crystal on.

Best regards,


Tongqing

Tongqing Zhou, Ph.D.
Staff Scientist
Structural Biology Section
Vaccine Research Center, NIAID/NIH
Building 40, Room 4607B
40 Convent Drive, MSC3027
Bethesda, MD 20892
(301) 594-8710 (Tel)
(301) 793-0794 (Cell)
(301) 480-2658 (Fax)
******************************************************************
The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives.
******************************************************************


-----Original Message-----
From: Eleanor Dodson [mailto:c...@ysbl.york.ac.uk]
Sent: Tuesday, June 15, 2010 4:46 AM
To: Zhou, Tongqing (NIH/VRC) [E]
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Stuck refinement

When this happens, I firstly suspect that the spacegroup may be wrong. We had a case where the symmetry was pseudo I4212 but was really
I222 (or was it really I212121) Anyway most of the structure obeyed the
I41212 symmetry but there was a tail which did not..)

Feed the unmerged reflections into pointless and see what it suggests

Eleanor

Zhou, Tongqing (NIH/VRC) [E] wrote:
Hi Everyone,

I have some problem in refining a structure. The data goes to 2.4A (with some 30% completeness at 2.15A), the structure was solved by MR with Phaser, refinement was done with Phenix, but the r and r-free are now staying at 26% and 32%, even with all possible waters and missing fragments added. Data was collected at APS at cryo condition. One thing I noticed during HKL2000 data processing was that the chi^2 were way too high at lower resolutions shells, I had to adjust the default error model in HKL2000 to get the chi^2 to around 1, but this adjustment reduced the overall I/sigI ratio a lot (from around 20 to 5).

The quality of electron density maps looks fine to me for a 2.4 A data set and I was able to build all the missing CDR loops for the antibody in the complex. I am lost now, should I just re-collect a new data set?

Thanks,


Tongqing

Tongqing Zhou, Ph.D.
Staff Scientist
Structural Biology Section
Vaccine Research Center, NIAID/NIH
Building 40, Room 4607B
40 Convent Drive, MSC3027
Bethesda, MD 20892
(301) 594-8710 (Tel)
(301) 793-0794 (Cell)
(301) 480-2658 (Fax)
******************************************************************
The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives.
******************************************************************


Reply via email to