Re: [ccp4bb] ctruncate bug?

James M Holton Tue, 18 Jun 2013 19:56:40 -0700

Actually, Jeff, the problem goes even deeper than that. Have a look at these 
Wilson plots:
http://bl831.als.lbl.gov/~jamesh/wilson/wilsons.png


For these plots I took Fs from a unit cell full of a random collection of 
atoms, squared them, added Gaussian noise with RMS = 1, and then ran them back 
through various programs. The "plateau" at F ~ 1 which overestimates some "true 
intensities" by almost a factor of a million arises because French & Wilson did 
not think it "right" to use the slope of the Wilson plot as a source of prior 
information. A bit naive, I suppose, because we can actually be REALLY sure 
that 1.0 A intensities are "zero" if the data drop into the noise at 3A. 
Nevertheless, no one has ever augmented the F&W procedure to take this prior 
knowledge into account. 

A shame! Because if they did there would be no need for a resolution cut-off at 
all. 

Sent from a tiny virtual keyboard on a plane about to take off

On Jun 19, 2013, at 1:08 AM, Jeff Headd <jjhe...@lbl.gov> wrote:

> Hi Ed,
> 
> Thanks for including the code block.
> 
> I've looked back over the F&W paper, and the reason for the h<-4.0 cutoff is 
> that the entire premise assumes that the true intensities are normally 
> distributed, and the formulation breaks down at that far out of an "outlier". 
> For most datasets I haven't seen this assumption to be a huge problem, but in 
> some cases the assumption of a normal distribution is not reasonable, and 
> you'll end up with a higher percentage of rejected weak intensities.
> 
> Kay, does the new XDSCONV method treat the negative intensities in some way 
> to make them positive, or does this just work with very weak positive 
> intensities?
> 
> Jeff
> 
> 
> On Tue, Jun 18, 2013 at 12:15 AM, Ed Pozharski <epozh...@umaryland.edu> wrote:
> Jeff,
> 
> thanks - I can see the same equation and cutoff applied in ctruncate source.  
>   Here is the relevant part of the code
> 
>>         // Bayesian statistics tells us to modify I/sigma by subtracting off 
>> sigma/S
>>         // where S is the mean intensity in the resolution shell
>>         h = I/sigma - sigma/S;
>>         // reject as unphysical reflections for which I < -3.7 sigma, or h < 
>> -4.0
>>         if (I/sigma < -3.7 || h < -4.0 ) {
>>             nrej++;
>>             if (debug) printf("unphys: %f %f %f %f\n",I,sigma,S,h);
>>             return(0);
>>         }
> 
> This seems to be arbitrary cutoff choice, given that they are hard-coded.  At 
> the very least, cutoffs should depend on the total number of reflections to 
> represent famyliwise error rates.
> 
> It is however the h-based rejection that seems most problematic to me.  In 
> the dataset in question, up to 20% reflections are rejected in the highest 
> resolution shell (granted, I/sigI there is 0.33).  I would expect that 
> reflections are rejected when they are deemed to be outliers due to reasons 
> other than statistical errors (e.g. streaks, secondary lattice spots in the 
> background, etc).  I must say that this was done with extremely good quality 
> data, so I       doubt that 1 out of 5 reflections returns some physically 
> impossible measurement.
> 
> What is happening is that <sigma>=3S in the highest resolution shell, and for 
> many reflection h<-4.0.  This does not mean that reflections are "unphysical" 
> though, just that shell as a whole has mostly weak data (in this case 89% 
> with I/sigI<2 and 73% with I/sigI<1).
> 
> What is counterintuitive is why do I have to discard reflections that are 
> just plain weak, and not really outliers?
> 
> Cheers,
> 
> Ed.
> 
> 
> 
> 
> On 06/17/2013 10:29 PM, Jeff Headd wrote:
>> Hi Ed,
>> 
>> I'm not directly familiar with the ctruncate implementation of French and 
>> Wilson, but from the implementation that I put into Phenix (based on the 
>> original F&W paper) I can tell you that any reflection where (I/sigI) - 
>> (sigI/mean_intensity) is less than a defined cutoff (in our case -4.0), then 
>> it is rejected. Depending on sigI and the mean intensity for a given shell, 
>> this can result in positive intensities that are also rejected. Typically 
>> this will effect very small positive intensities as you've observed.
>> 
>> I don't recall the mathematical justification for this and don't have a copy 
>> of F&W here at home, but I can have a look in the morning when I get into 
>> the lab and let you know.
>> 
>> Jeff
>> 
>> 
>> On Mon, Jun 17, 2013 at 5:04 PM, Ed Pozharski <epozh...@umaryland.edu> wrote:
>> I noticed something strange when processing a dataset with imosflm.  The
>> final output ctruncate_etc.mtz, contains IMEAN and F columns, which
>> should be the conversion according to French&Wilson.  Problem is that
>> IMEAN has no missing values (100% complete) while F has about 1500
>> missing (~97% complete)!
>> 
>> About half of the reflections that go missing are negative, but half are
>> positive.  About 5x more negative intensities are successfully
>> converted.  Most impacted are high resolution shells with weak signal,
>> so I am sure impact on "normal" refinement would be minimal.
>> 
>> However, I am just puzzled why would ctruncate reject positive
>> intensities (or negative for that matter - I don't see any cutoff
>> described in the manual and the lowest I/sigI for successfully converted
>> reflection is -18).
>> 
>> Is this a bug or feature?
>> 
>> Cheers,
>> 
>> Ed.
>> 
>> --
>> I don't know why the sacrifice thing didn't work.
>> Science behind it seemed so solid.
>>                                     Julian, King of Lemurs
>> 
> 
> 
> -- 
> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
>                                                 Julian, King of Lemurs
>

Re: [ccp4bb] ctruncate bug?

Reply via email to