Kumar wrote:
Hello CCP4bb members,
I have been trying to obtain phases for a protein which contain
~1300aa. We have obtained native data to a resolution of 3.3A (Space
group I222 or I212121). But we are having tough time phasing it.
'Se' labeled crystals diffracts maximally up to 3.5 to 4 A and dies
very quickly on most of the beamlines.
Apart from Se, do you have any atoms heavier than sulphur in your
crystal/solvent? This could make your crystals decay faster than
normal. Other than that, I refer you to my table of how long you can
expect a typical protein crystal to last at most of the world's beamlines:
http://bl831.als.lbl.gov/damage_rates.pdf
I invite corrections from any beamline scientist who thinks this table
is in error!
We have scanned at Se wavelength and it gives very strong signal as it
contain ~45 Se in AU (1300 aa). It is difficult to collect a complete
dataset (maximally we get 50-60 % completion with Rmerge ~15) out of
one crystal on regular beamline. At microfocus beamline (APS), we were
able to collect data in 3-4 batches and merge them to get a complete
dataset (Rmerge ~18-20) out of one crystal. We used data collected on
microfocus beamline (at peak wavelength) for locating heavy atom
position using SHELXD, Solve and Phenix.hyss. SOlve and Phenix.hyss
find very few heavy atom sites 1-5 whereas SHELX-CDE lists many but
shows no difference in original and inverted (contrast and
connectivity). Our phasing attempts with datasets obtained after
merging two incomplete dataset from two different crystal has also
been disappointing.
It is unwise to use burnt-out data for anomalous difference phasing.
Back off on the total exposure to less than 2-5 MGy in total and average
data from more crystals.
My another worry is absolute value of average intensity, which seems
to be quite low in most of the datasets. Below I have pasted last
table of scale.log (HKL2000).
Shell Lower Upper Average Average Norm. Linear Square
limit Angstrom I error stat. Chi**2 R-fac R-fac
50.00 7.53 45.4 1.6 1.3 1.295 0.055 0.047
7.53 5.98 11.4 1.3 1.3 0.672 0.135 0.114
5.98 5.23 11.2 1.6 1.6 0.643 0.171 0.152
5.23 4.75 16.8 2.0 1.9 0.736 0.148 0.118
4.75 4.41 18.8 2.2 2.2 0.739 0.143 0.132
4.41 4.15 14.6 2.4 2.4 0.653 0.190 0.175
4.15 3.94 11.3 2.5 2.5 0.582 0.247 0.226
3.94 3.77 10.1 2.8 2.8 0.511 0.280 0.191
3.77 3.63 8.0 3.1 3.1 0.450 0.315 0.285
3.63 3.50 7.6 3.3 3.2 0.483 0.311 0.270
All reflections 15.5 2.3 2.2 0.694 0.153 0.106
The absolute value of intensity is not important unless you are
comparing it to a control experiment done in exactly the same way.
Intensity relative to the error in the intensity, however, is very
important.
Now, I want you to help me by answering some of my queries:
1. Is it possible to get MAD/SAD phasing done from a dataset having
more than 15% Rmerge and resolution in the range of 4 - 4.5 Ang?
Yes, but only if your anomalous signal is greater than the noise. This
does not appear to be so in your case. In fact, you seem to have a very
good example of a marginal case that is below the "threshold of
solvability".
I like to think of things in terms of signal-to-noise, and one can use a
rearrangement of the Crick-Magdoff equation to tell you what the I/sigma
of your data set needs to be for delta-F to be greater than sigma(delta-F):
I/sigma(I) > 1.3*sqrt(Daltons/sites)/f"
where:
I/sigma(I) is the signal-to-noise ratio of the data set required to
solve it by MAD/SAD
Daltons is the molecular weight of the protein in amu
sites is the number of Se sites
f" is the f" of those sites (in "electrons")
In your case: I/sigma(I) > 1.3*sqrt(1300*120/45)/4 = 19 is required.
You have this in your lowest-angle bin, but nowhere else. It might be
possible to find some sites, but you are not going to get phases beyond
7A, and phase extension from this low a level is hard to do, even when
the I/sigma is high. Remember, "thresholds" like this are not sharp but
represent a level of data quality where the best crystallographers in
the world working very carefully and with a significant amount of luck
have managed to solve a structure. If you are not one of these highly
skilled and experienced people, you will probably need better data. A
lot better data. And having better data is not a bad thing.
For the case Tommi Kajander put forward, I/sigma(I) >
1.3*sqrt(365*8*120/80)/4 = 22 was required. Tommi? Was your I/sig
better or worse than this?
2. Will a complete data set obtained from merging various
batches(30-40 frames each) from one or more than one crystal will have
proper anomalous signal for phasing? I am worried as weak anomalous
signal may get lost while merging.
If you average enough data, you can theoretically get any I/sigma you
want. In your case you will probably have to average data from not less
than 8 crystals to bring your I/sigma up to 20 (6.7 * sqrt(8) = 20).
3. Will such a low value of average Intensities (as shown above from
HKL scale log file) will be good enough for MAD/SAD phasing
Probably not.
or I really need to improve crystal quality for stronger diffraction.
It is always better to have better crystals.
4. For MAD/SAD phasing, till what resolution we need to have anomalous
signal ? Many of my datasets shows anomalous signal maximally up to
6-8 A (calculated using Phenix.xtriage).
You need to have anomalous signal out to the resolution to which you
want to have phases. In general, if you have 2.5A phases, you can
extend them easily to 2.0 A with solvent flattening, but extending 6 A
phases to 4 A is more problematic, I think this could be because things
like histogram matching don't work as well with 6A protein maps.
5. Since I have low resolution (3.5 to 4 A)data, relatively high
Rmerge (14-15%), lower value of average intensity, anomalous signal up
to 6 A or so..... which programs will be more useful for heavy atom
location and to prevent false positives from being selected?
There are no programs that will make bad data good. It may be possible
to find these 45 sites with the programs you are already using, but it
will be a challenge to get phases even if you do.
Good luck,
-James Holton
MAD Scientist