Re: [ccp4bb] Problems with phasing a protein (1300aa)

Phil Evans Mon, 23 Mar 2009 09:09:50 -0700

I'm happy to change the column titles if it makes it clearer. Actuallythe "I/sigma" column in the Scala output is not very useful:it is <I> / RMSscatter, ie the mean intensity/mean error, forindividual observations, not taking into account multiplemeasurements. Because it is ratio of means (rather than a mean ofratios), it can behave oddly depending on the distribution ofintensities, for instance giving an overall value which is outside therange of values in resolution bins. It is the ratio of the previoustwo columns.

On the other hand the column labelled "Mn(I)/sd" is the mean of ratiosfor each reflection, ie< <I>/σ(<I>) > and does take into account themultiplicity of measurements, so is much more relevant as an indicatorof data quality


see
http://www.ccp4wiki.org/~ccp4wiki/wiki/index.php?title=Scaling_experimental_intensities_with_Scala

Scala also outputs a convenient "Table 1" summary

On 23 Mar 2009, at 15:50, James Holton wrote:

I guess when I talk about signal-to-noise I assume the one that ismost relevant to the task at hand. So, to me, I/sigma(I) at thephasing step would be the average intensity (I) divided by the sigma(standard deviation) assigned to it AFTER scaling/mergeing. I admitthat the "I/sigma" column from SCALA is potentially confusing, butif you are dealing with spot intensities, this is the first I/sigmayou think about, so I guess this is what Phil was thinking.
Personally, I think the descriptions of the columns in this tableare clear if you read the caption before the table in the SCALAoutput, but Tassos is right that an alarming number of people havenever done this. RTFM?
When it doubt, use mtzdmp to see what the average values of the datacolumns really are.
-James Holton
MAD Scientist

Anastassis Perrakis wrote:
I like to think of things in terms of signal-to-noise, and one canuse arearrangement of the Crick-Magdoff equation to tell you what the I/sigmaof your data set needs to be for delta-F to be greater thansigma(delta-F):
I/sigma(I) > 1.3*sqrt(Daltons/sites)/f"

where:
I/sigma(I) is the signal-to-noise ratio of the data set required to
solve it by MAD/SAD
Daltons   is the molecular weight of the protein in amu
sites         is the number of Se sites
f"            is the f" of those sites (in "electrons")
let me see .... we recently solved a 200 residues protein, 4 molAU, with 2 Se per mol, total 8 Se.Since 160 residues were ordered, I will make for you a discount,18,000 D/monomer, 70,000 in AU.
I truncated data to 4.2 for Se search.

1.3*sqrt(70000/8)/6.5= 19

Statistics from Scala:
N 1/d^2 Dmin(A) Rmrg Rfull Rcum Ranom Nanom Av_I SIGMA I/sigma sd Mn(I/sd) 1 0.0098 10.11 0.048 0.049 0.0480.036 349 4967 419 11.9 342 25.8 2 0.01967.15 0.050 0.044 0.049 0.031 707 5360 462 11.6372 28.8 3 0.0293 5.84 0.089 0.062 0.057 0.047975 1634 224 7.3 177 19.4 4 0.0391 5.06 0.0650.048 0.059 0.039 1140 2107 207 10.2 218 21.05 0.0489 4.52 0.061 0.043 0.060 0.034 1315 2523 22711.1 253 21.9 6 0.0587 4.13 0.072 0.051 0.062 0.0351470 2142 223 9.6 242 20.2 7 0.0685 3.82 0.0910.061 0.066 0.042 1605 1566 203 7.7 219 16.1 80.0782 3.57 0.128 0.086 0.071 0.052 1737 1034 1865.6 199 12.3 9 0.0880 3.37 0.189 0.137 0.077 0.0741859 667 181 3.7 187 8.9 10 0.0978 3.20 0.3140.224 0.085 0.129 1940 374 170 2.2 178 5.5
So, that would support your argument.

HOWEVER that would mean looking at the M(I/sd) in the table!!!
"Mn(I/sd)" is not the same as "I/sigma" in Scala notation!!!! Mostpeople think of I/sigma(I) in your notation,to be the I/sigma in the scala output, or the I devided by sd inthe Denzo output. These are (very) different.I am not sure which you meant since I/sigma(I) is not the fullnotation (place the <> in the favorite place first ...), but itseems correct if you meant Mn(I/sd) which most people do not quoteor use much ;-)
Greetings,
Tassos


A.

Re: [ccp4bb] Problems with phasing a protein (1300aa)

Reply via email to