Re: [ccp4bb] Problems with phasing a protein (1300aa)

James Holton Sat, 21 Mar 2009 14:08:13 -0700

Kumar wrote:

Hello CCP4bb members,
I have been trying to obtain phases for a protein which contain~1300aa. We have obtained native data to a resolution of 3.3A (Spacegroup I222 or I212121). But we are having tough time phasing it.
'Se' labeled crystals diffracts maximally up to 3.5 to 4 A and diesvery quickly on most of the beamlines.

Apart from Se, do you have any atoms heavier than sulphur in yourcrystal/solvent? This could make your crystals decay faster thannormal. Other than that, I refer you to my table of how long you canexpect a typical protein crystal to last at most of the world's beamlines:

http://bl831.als.lbl.gov/damage_rates.pdf

I invite corrections from any beamline scientist who thinks this tableis in error!

We have scanned at Se wavelength and it gives very strong signal as itcontain ~45 Se in AU (1300 aa). It is difficult to collect a completedataset (maximally we get 50-60 % completion with Rmerge ~15) out ofone crystal on regular beamline. At microfocus beamline (APS), we wereable to collect data in 3-4 batches and merge them to get a completedataset (Rmerge ~18-20) out of one crystal. We used data collected onmicrofocus beamline (at peak wavelength) for locating heavy atomposition using SHELXD, Solve and Phenix.hyss. SOlve and Phenix.hyssfind very few heavy atom sites 1-5 whereas SHELX-CDE lists many butshows no difference in original and inverted (contrast andconnectivity). Our phasing attempts with datasets obtained aftermerging two incomplete dataset from two different crystal has alsobeen disappointing.

It is unwise to use burnt-out data for anomalous difference phasing.Back off on the total exposure to less than 2-5 MGy in total and averagedata from more crystals.

My another worry is absolute value of average intensity, which seemsto be quite low in most of the datasets. Below I have pasted lasttable of scale.log (HKL2000).

Shell Lower Upper Average      Average     Norm. Linear Square
limit    Angstrom       I   error   stat. Chi**2  R-fac  R-fac
     50.00   7.53    45.4     1.6     1.3  1.295  0.055  0.047
      7.53   5.98    11.4     1.3     1.3  0.672  0.135  0.114
      5.98   5.23    11.2     1.6     1.6  0.643  0.171  0.152
      5.23   4.75    16.8     2.0     1.9  0.736  0.148  0.118
      4.75   4.41    18.8     2.2     2.2  0.739  0.143  0.132
      4.41   4.15    14.6     2.4     2.4  0.653  0.190  0.175
      4.15   3.94    11.3     2.5     2.5  0.582  0.247  0.226
      3.94   3.77    10.1     2.8     2.8  0.511  0.280  0.191
      3.77   3.63     8.0     3.1     3.1  0.450  0.315  0.285
      3.63   3.50     7.6     3.3     3.2  0.483  0.311  0.270
 All reflections     15.5     2.3     2.2  0.694  0.153  0.106

The absolute value of intensity is not important unless you arecomparing it to a control experiment done in exactly the same way.Intensity relative to the error in the intensity, however, is veryimportant.

Now, I want you to help me by answering some of my queries:
1. Is it possible to get MAD/SAD phasing done from a dataset havingmore than 15% Rmerge and resolution in the range of 4 - 4.5 Ang?

Yes, but only if your anomalous signal is greater than the noise. Thisdoes not appear to be so in your case. In fact, you seem to have a verygood example of a marginal case that is below the "threshold ofsolvability".I like to think of things in terms of signal-to-noise, and one can use arearrangement of the Crick-Magdoff equation to tell you what the I/sigmaof your data set needs to be for delta-F to be greater than sigma(delta-F):


I/sigma(I) > 1.3*sqrt(Daltons/sites)/f"

where:

I/sigma(I) is the signal-to-noise ratio of the data set required tosolve it by MAD/SAD

Daltons   is the molecular weight of the protein in amu
sites         is the number of Se sites
f"            is the f" of those sites (in "electrons")

In your case: I/sigma(I) > 1.3*sqrt(1300*120/45)/4 = 19 is required.You have this in your lowest-angle bin, but nowhere else. It might bepossible to find some sites, but you are not going to get phases beyond7A, and phase extension from this low a level is hard to do, even whenthe I/sigma is high. Remember, "thresholds" like this are not sharp butrepresent a level of data quality where the best crystallographers inthe world working very carefully and with a significant amount of luckhave managed to solve a structure. If you are not one of these highlyskilled and experienced people, you will probably need better data. Alot better data. And having better data is not a bad thing.

For the case Tommi Kajander put forward, I/sigma(I) >1.3*sqrt(365*8*120/80)/4 = 22 was required. Tommi? Was your I/sigbetter or worse than this?

2. Will a complete data set obtained from merging variousbatches(30-40 frames each) from one or more than one crystal will haveproper anomalous signal for phasing? I am worried as weak anomaloussignal may get lost while merging.

If you average enough data, you can theoretically get any I/sigma youwant. In your case you will probably have to average data from not lessthan 8 crystals to bring your I/sigma up to 20 (6.7 * sqrt(8) = 20).

3. Will such a low value of average Intensities (as shown above fromHKL scale log file) will be good enough for MAD/SAD phasing

Probably not.

or I really need to improve crystal quality for stronger diffraction.


It is always better to have better crystals.

4. For MAD/SAD phasing, till what resolution we need to have anomaloussignal ? Many of my datasets shows anomalous signal maximally up to6-8 A (calculated using Phenix.xtriage).

You need to have anomalous signal out to the resolution to which youwant to have phases. In general, if you have 2.5A phases, you canextend them easily to 2.0 A with solvent flattening, but extending 6 Aphases to 4 A is more problematic, I think this could be because thingslike histogram matching don't work as well with 6A protein maps.

5. Since I have low resolution (3.5 to 4 A)data, relatively highRmerge (14-15%), lower value of average intensity, anomalous signal upto 6 A or so..... which programs will be more useful for heavy atomlocation and to prevent false positives from being selected?

There are no programs that will make bad data good. It may be possibleto find these 45 sites with the programs you are already using, but itwill be a challenge to get phases even if you do.

Good luck,

-James Holton
MAD Scientist

Re: [ccp4bb] Problems with phasing a protein (1300aa)

Reply via email to