Kumar wrote:
Hello CCP4bb members,

I have been trying to obtain phases for a protein which contain ~1300aa. We have obtained native data to a resolution of 3.3A (Space group I222 or I212121). But we are having tough time phasing it.

'Se' labeled crystals diffracts maximally up to 3.5 to 4 A and dies very quickly on most of the beamlines.
Apart from Se, do you have any atoms heavier than sulphur in your crystal/solvent? This could make your crystals decay faster than normal. Other than that, I refer you to my table of how long you can expect a typical protein crystal to last at most of the world's beamlines:
http://bl831.als.lbl.gov/damage_rates.pdf

I invite corrections from any beamline scientist who thinks this table is in error!

We have scanned at Se wavelength and it gives very strong signal as it contain ~45 Se in AU (1300 aa). It is difficult to collect a complete dataset (maximally we get 50-60 % completion with Rmerge ~15) out of one crystal on regular beamline. At microfocus beamline (APS), we were able to collect data in 3-4 batches and merge them to get a complete dataset (Rmerge ~18-20) out of one crystal. We used data collected on microfocus beamline (at peak wavelength) for locating heavy atom position using SHELXD, Solve and Phenix.hyss. SOlve and Phenix.hyss find very few heavy atom sites 1-5 whereas SHELX-CDE lists many but shows no difference in original and inverted (contrast and connectivity). Our phasing attempts with datasets obtained after merging two incomplete dataset from two different crystal has also been disappointing.
It is unwise to use burnt-out data for anomalous difference phasing. Back off on the total exposure to less than 2-5 MGy in total and average data from more crystals.

My another worry is absolute value of average intensity, which seems to be quite low in most of the datasets. Below I have pasted last table of scale.log (HKL2000).
Shell Lower Upper Average      Average     Norm. Linear Square
limit    Angstrom       I   error   stat. Chi**2  R-fac  R-fac
     50.00   7.53    45.4     1.6     1.3  1.295  0.055  0.047
      7.53   5.98    11.4     1.3     1.3  0.672  0.135  0.114
      5.98   5.23    11.2     1.6     1.6  0.643  0.171  0.152
      5.23   4.75    16.8     2.0     1.9  0.736  0.148  0.118
      4.75   4.41    18.8     2.2     2.2  0.739  0.143  0.132
      4.41   4.15    14.6     2.4     2.4  0.653  0.190  0.175
      4.15   3.94    11.3     2.5     2.5  0.582  0.247  0.226
      3.94   3.77    10.1     2.8     2.8  0.511  0.280  0.191
      3.77   3.63     8.0     3.1     3.1  0.450  0.315  0.285
      3.63   3.50     7.6     3.3     3.2  0.483  0.311  0.270
 All reflections     15.5     2.3     2.2  0.694  0.153  0.106
The absolute value of intensity is not important unless you are comparing it to a control experiment done in exactly the same way. Intensity relative to the error in the intensity, however, is very important.
Now, I want you to help me by answering some of my queries:

1. Is it possible to get MAD/SAD phasing done from a dataset having more than 15% Rmerge and resolution in the range of 4 - 4.5 Ang?
Yes, but only if your anomalous signal is greater than the noise. This does not appear to be so in your case. In fact, you seem to have a very good example of a marginal case that is below the "threshold of solvability". I like to think of things in terms of signal-to-noise, and one can use a rearrangement of the Crick-Magdoff equation to tell you what the I/sigma of your data set needs to be for delta-F to be greater than sigma(delta-F):

I/sigma(I) > 1.3*sqrt(Daltons/sites)/f"

where:
I/sigma(I) is the signal-to-noise ratio of the data set required to solve it by MAD/SAD
Daltons   is the molecular weight of the protein in amu
sites         is the number of Se sites
f"            is the f" of those sites (in "electrons")

In your case: I/sigma(I) > 1.3*sqrt(1300*120/45)/4 = 19 is required. You have this in your lowest-angle bin, but nowhere else. It might be possible to find some sites, but you are not going to get phases beyond 7A, and phase extension from this low a level is hard to do, even when the I/sigma is high. Remember, "thresholds" like this are not sharp but represent a level of data quality where the best crystallographers in the world working very carefully and with a significant amount of luck have managed to solve a structure. If you are not one of these highly skilled and experienced people, you will probably need better data. A lot better data. And having better data is not a bad thing.

For the case Tommi Kajander put forward, I/sigma(I) > 1.3*sqrt(365*8*120/80)/4 = 22 was required. Tommi? Was your I/sig better or worse than this?

2. Will a complete data set obtained from merging various batches(30-40 frames each) from one or more than one crystal will have proper anomalous signal for phasing? I am worried as weak anomalous signal may get lost while merging.
If you average enough data, you can theoretically get any I/sigma you want. In your case you will probably have to average data from not less than 8 crystals to bring your I/sigma up to 20 (6.7 * sqrt(8) = 20).

3. Will such a low value of average Intensities (as shown above from HKL scale log file) will be good enough for MAD/SAD phasing
Probably not.
or I really need to improve crystal quality for stronger diffraction.

It is always better to have better crystals.

4. For MAD/SAD phasing, till what resolution we need to have anomalous signal ? Many of my datasets shows anomalous signal maximally up to 6-8 A (calculated using Phenix.xtriage).
You need to have anomalous signal out to the resolution to which you want to have phases. In general, if you have 2.5A phases, you can extend them easily to 2.0 A with solvent flattening, but extending 6 A phases to 4 A is more problematic, I think this could be because things like histogram matching don't work as well with 6A protein maps.

5. Since I have low resolution (3.5 to 4 A)data, relatively high Rmerge (14-15%), lower value of average intensity, anomalous signal up to 6 A or so..... which programs will be more useful for heavy atom location and to prevent false positives from being selected?
There are no programs that will make bad data good. It may be possible to find these 45 sites with the programs you are already using, but it will be a challenge to get phases even if you do.
Good luck,

-James Holton
MAD Scientist

Reply via email to