Re: [ccp4bb] To Trim or Not to To Trim

James Holton Tue, 28 Mar 2023 14:54:06 -0700

Actually, they are not quite the same.

Members of an ensemble spread out over a supercell are equivalent to aconventional multi-conformer model, but only when it comes to thedensity derived from the coordinate atoms alone. They are not the samewhen it comes to bulk solvent and also not the same when it comes toclashes at crystal packing interfaces.

It is actually a tricky question: if you have a 2-conformer salt bridgebetween symmetry mates, which one should be "A" or "B"? In phenix, "A"will clash with "A" in the symmetry mate. Refmac, as I understand it,checks if the occupancies of any two atoms of any conformer letter sumto >= 1.0. If so, they can clash. Helen tells me Vagabond's treatmentof such contacts is under development. A consensus has not been reached.

However, in a supercell refinement you don't have to ask this question.Every atom has an occupancy of 1.00 and there are no conformer letters(other than the default " "). Molecules then interact with theirneighbors by the same rules as anywhere else in space. And yes, eachone would get its own, personal, solvent envelope. Neat, huh?


If you want to try supercell refinement, do this:

1) expand your mtz data to P1. Use the CCP4 "cad" program with thekeyword "outlim space 1". Do not change the space group.2) in a second run of "cad", change the space group to P1 and multiplyall your Fs by the number of full unit cells you want. Lets say that is8. I.E.:

 "scale file 1 8.0 0"

3) re-index this P1 mtz file with the "reindex" program, using keyword"reindex 2h,2k,2l". You will get a new mtz that has 2x the cell lengthalong each edge, but is only 12.5% complete. Don't worry, that's goingto be OK.4) Now go to the *.pdb file: edit the "CRYST1" line to have the samecell as the new mtz, and change the space group to P1.

5) make copies of your coordinates and shift them to fill this new P1supercell. Note that this is not only populating all symmetry mates inone unit cell (total of 4 in P212121), but also making translated copiesin the seven other unit cells. This would be 4*8 or 32 copies in all.How you name them depends on your personal preferences, but a simpleapproach is to give each symmetry mate its own chain ID (upper and lowercase are allowed). The "symgen P212121" keyword is convenient, but doesnot make any effort to pack things into a box. You need to do thatyourself by applying various "shift 1 0 -1" like operations in pdbset. This might be more intuitive to do in coot. Also, while you are doingthis you need to decide which part of your multi-unit-cell space isgoing to correspond to your "conformer A", and which other part is goingto be "B", and then "C", etc. You won't be doing occupancy refinementhere. Rather, by adjusting how many "A"s and "B"s you use to make up theensemble you effectively now have 32 levels of occupancy (instead of theusual 100).

Important point: If you find this whole process daunting and you keeprunning out of letters in the alphabet, now you understand why theauthors of refinement programs have not made this automatic. There are aVERY large number of ways to pack these slightly-different copies ofyour ASU into a supercell, and only one is "best". Yes, as Vaheh said,all the combinations will be equivalent when it comes to the densityderived from coordinates, but the solvent density will be allowed toconform to each local environment, and your clash scores will vary. Which combination is "correct"? I think that would be the one with theleast number of clashes. It is like a big combination lock. But, for afirst try, you might want to just use the "symgen" and "shift 1 0 0"commands, re-label all the chain IDs, then put the same confomereverywhere and let non-bond repulsions do their job in the subsequentrefinement.

6) give your new re-indexed mtz file and expanded-to-P1-supercell pdbfile to your favorite refinement program.

Now for the amazing part: this actually works. At least refmac5,phenix.refine and shelxl (in my hands) have absolutely no problemrefining this highly "redundant" model against data that are only 12.5%complete. They don't complain at all! Your R factors and geometryscores will be stable, but hopefully at least a little better than ifyou used a conventional, single-ASU ensemble. You might think it wouldbe a good idea to fill all the "missing" HKLs with zeroes. Do not dothis. Phenix.refine hates it and refmac5 only barely tolerates it. Youmight also think that with so many "free parameters" that your Rworkwould drop to zero right away (leaving Rfree far behind) but in practicethat does not happen.

Very important: DO NOT LOOK AT THE MAPS that come out of this kind ofrefinement. Not directly. They will look all weird and distorted. Youneed to average over all the ASUs to recover interpretable 2Fo-Fc andFo-Fc maps. The strictly correct way to do this is to cut out each ASU,re-orient and align these maps, and then average them all together.Fortunately, there is another way to do this that is much easier and faster:

7) Re-index the refinement output mtz file with "reindex h/2,k/2,l/2",and in the same run of "reindex" change your space group back to what itwas in your normal refinement. Do NOT run this mtz file through cad. Cad will throw out all but one ASU. In fact, don't open this re-indexedmtz file with any program other than coot or "fft". The reason is thismtz file still has P1 data. It is "overcomplete". Just like when cadperformed the "outlim space 1" there is more than one ASU in the mtz. Perhaps it is a quirk in the fft algorithm, but this all-ASU averaginghappens automatically if the input reflection data is "overcomplete".

8) finally, you might want to write a script for reversing the procedurefor populating your supercell so that you can align all the members intoa single ASU and look at them.

But wait!? Isn't this "over-fitting"? No, it is not. Over-fitting ingeneral is when you drive your residual (aka Rwork) essentially tozero. That doesn't happen here because the geometry term holds youback. You might think that with enough copies in the ensemble it shouldbe possible to fit any density with geometrically reasonable molecules,but that is not what happens in practice. This is actually quiteremarkable! So many "free parameters" and yet the tug-o-war between Rfactors and geometry remains. Now, of course, the real molecules in thereal crystal are simultaneously obeying the laws of chemistry andgenerating the diffraction patterns that we see, but I have never founda way to make a macromolecular model that does both. Neither hasanybody else, of course, but wouldn't it be cool if someone figured out how?


-James Holton
MAD Scientist

On 3/18/2023 6:35 PM, Oganesyan, Vaheh wrote:

Hi Ben,
All copies created by multiplying cell dimensions will act exactlysame as the original one, mathematically exactly. Nick’s approach isbetter. Something similar to what Nick said was published around2002-2003. I was reviewing it. I did not understand then what theauthor was trying to achieve and kept thinking about it for fewmonths. The author split model into 20 each with 5% occupancy. Afterrefinement he got an ensemble that looked like NMR structures. I’m notsure, however, that adding that uncertainty will help answering anyquestion.
Vaheh
*From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> *On Behalf Of*benjamin bax
*Sent:* Saturday, March 18, 2023 5:07 PM
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] To Trim or Not to To Trim

Hi,
Probably a stupid question.
Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or27 structures) and restrain well defined parts of structure to be‘identical’ ? To give you a more NMR like chemically sensible ensembleof structures?
Ben


> On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote:
>
> Models for crystallography have two purposes: refinement andinterpretation. Here these two purposes are in conflict. Neither caseis handled well by either trim or not trim scenario, but trimmingresults in a deficit for refinement and not-trimming results in adeficit for interpretation.
>
> Our computational tools are not “fixed” in the same way that thestandard amino acids are “fixed” or your government’s bureaucracypathways are “fixed”. They are open for debate and for adjustments.This is a fine example where it may be more productive to discuss theoptions for making changes to the model itself or its representation,to better account for awkward situations such as these. Otherwise weare left figuring out the best imperfect way to use an imperfect tool(as all tools are, to varying degrees!), which isn’t satisfying forenough people, enough of the time.
>
> I now appreciate the hypocrisy in the argument “do not trim, butalso don’t model disordered regions”, even though I’d be keen to avoidtrimming. This discussion has therefore softened my own viewpoint.
>
> My refinement models (as implemented in Vagabond) do away with theconcept of B factors precisely for the anguish it causes here, andrefines a distribution of protein conformations which is sampled togenerate an ensemble. By describing the conformations through thetorsion angles that comprise the protein, modelling flexibility of adisordered lysine is comparatively trivial, and indeed modelling allpossible conformations of a disordered loop becomes feasible. Lysinesend up looking like a frayed end of a rope. Each conformation canproduce its own solvent mask, which can be summed together to producea blurring of density that matches what you would expect to see in thecrystal.
>
> In my experience this doesn’t drop the R factors as much as you’dassume, because blurred out protein density does look very much likesolvent, but it vastly improves the interpretability of the model.This also better models the boundary between the atoms you would trimand those you’d leave untrimmed, by avoiding such a binarydistinction. No fear of trimming and pushing those errors unseen intothe rest of the structure. No fear of leaving atoms in with aninadequate B factor model that cannot capture the nature of the disorder.
>
> Vagabond is undergoing a heavy rewrite though, and is not yet readyfor human consumption. Its first iteration worked onsingle-dataset-single-model refinement, which handled disordered sidechains well enough, with no need to decide to exclude atoms. The heartof the issue lies in main chain flexibility, and this must be handledcorrectly, for reasons of interpretability and elucidating thebiological impact. This model isn’t perfect either, and necessitatesits own compromises - but will provide another tool in the structuralbiology arsenal.
>
> —-
>
> Dr Helen Ginn
> Group leader, DESY
> Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR)
> Luruper Chaussee 149
> 22607 Hamburg
>
> ########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
>
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted bywww.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions areavailable at https://www.jiscmail.ac.uk/policyandsecurity/
########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
This message was issued to members of www.jiscmail.ac.uk/CCP4BB<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted bywww.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions areavailable at https://www.jiscmail.ac.uk/policyandsecurity/
------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] To Trim or Not to To Trim

Reply via email to