Actually, they are not quite the same.
Members of an ensemble spread out over a supercell are equivalent to a
conventional multi-conformer model, but only when it comes to the
density derived from the coordinate atoms alone. They are not the same
when it comes to bulk solvent and also not the same when it comes to
clashes at crystal packing interfaces.
It is actually a tricky question: if you have a 2-conformer salt bridge
between symmetry mates, which one should be "A" or "B"? In phenix, "A"
will clash with "A" in the symmetry mate. Refmac, as I understand it,
checks if the occupancies of any two atoms of any conformer letter sum
to >= 1.0. If so, they can clash. Helen tells me Vagabond's treatment
of such contacts is under development. A consensus has not been reached.
However, in a supercell refinement you don't have to ask this question.
Every atom has an occupancy of 1.00 and there are no conformer letters
(other than the default " "). Molecules then interact with their
neighbors by the same rules as anywhere else in space. And yes, each
one would get its own, personal, solvent envelope. Neat, huh?
If you want to try supercell refinement, do this:
1) expand your mtz data to P1. Use the CCP4 "cad" program with the
keyword "outlim space 1". Do not change the space group.
2) in a second run of "cad", change the space group to P1 and multiply
all your Fs by the number of full unit cells you want. Lets say that is
8. I.E.:
"scale file 1 8.0 0"
3) re-index this P1 mtz file with the "reindex" program, using keyword
"reindex 2h,2k,2l". You will get a new mtz that has 2x the cell length
along each edge, but is only 12.5% complete. Don't worry, that's going
to be OK.
4) Now go to the *.pdb file: edit the "CRYST1" line to have the same
cell as the new mtz, and change the space group to P1.
5) make copies of your coordinates and shift them to fill this new P1
supercell. Note that this is not only populating all symmetry mates in
one unit cell (total of 4 in P212121), but also making translated copies
in the seven other unit cells. This would be 4*8 or 32 copies in all.
How you name them depends on your personal preferences, but a simple
approach is to give each symmetry mate its own chain ID (upper and lower
case are allowed). The "symgen P212121" keyword is convenient, but does
not make any effort to pack things into a box. You need to do that
yourself by applying various "shift 1 0 -1" like operations in pdbset.
This might be more intuitive to do in coot. Also, while you are doing
this you need to decide which part of your multi-unit-cell space is
going to correspond to your "conformer A", and which other part is going
to be "B", and then "C", etc. You won't be doing occupancy refinement
here. Rather, by adjusting how many "A"s and "B"s you use to make up the
ensemble you effectively now have 32 levels of occupancy (instead of the
usual 100).
Important point: If you find this whole process daunting and you keep
running out of letters in the alphabet, now you understand why the
authors of refinement programs have not made this automatic. There are a
VERY large number of ways to pack these slightly-different copies of
your ASU into a supercell, and only one is "best". Yes, as Vaheh said,
all the combinations will be equivalent when it comes to the density
derived from coordinates, but the solvent density will be allowed to
conform to each local environment, and your clash scores will vary.
Which combination is "correct"? I think that would be the one with the
least number of clashes. It is like a big combination lock. But, for a
first try, you might want to just use the "symgen" and "shift 1 0 0"
commands, re-label all the chain IDs, then put the same confomer
everywhere and let non-bond repulsions do their job in the subsequent
refinement.
6) give your new re-indexed mtz file and expanded-to-P1-supercell pdb
file to your favorite refinement program.
Now for the amazing part: this actually works. At least refmac5,
phenix.refine and shelxl (in my hands) have absolutely no problem
refining this highly "redundant" model against data that are only 12.5%
complete. They don't complain at all! Your R factors and geometry
scores will be stable, but hopefully at least a little better than if
you used a conventional, single-ASU ensemble. You might think it would
be a good idea to fill all the "missing" HKLs with zeroes. Do not do
this. Phenix.refine hates it and refmac5 only barely tolerates it. You
might also think that with so many "free parameters" that your Rwork
would drop to zero right away (leaving Rfree far behind) but in practice
that does not happen.
Very important: DO NOT LOOK AT THE MAPS that come out of this kind of
refinement. Not directly. They will look all weird and distorted. You
need to average over all the ASUs to recover interpretable 2Fo-Fc and
Fo-Fc maps. The strictly correct way to do this is to cut out each ASU,
re-orient and align these maps, and then average them all together.
Fortunately, there is another way to do this that is much easier and faster:
7) Re-index the refinement output mtz file with "reindex h/2,k/2,l/2",
and in the same run of "reindex" change your space group back to what it
was in your normal refinement. Do NOT run this mtz file through cad.
Cad will throw out all but one ASU. In fact, don't open this re-indexed
mtz file with any program other than coot or "fft". The reason is this
mtz file still has P1 data. It is "overcomplete". Just like when cad
performed the "outlim space 1" there is more than one ASU in the mtz.
Perhaps it is a quirk in the fft algorithm, but this all-ASU averaging
happens automatically if the input reflection data is "overcomplete".
8) finally, you might want to write a script for reversing the procedure
for populating your supercell so that you can align all the members into
a single ASU and look at them.
But wait!? Isn't this "over-fitting"? No, it is not. Over-fitting in
general is when you drive your residual (aka Rwork) essentially to
zero. That doesn't happen here because the geometry term holds you
back. You might think that with enough copies in the ensemble it should
be possible to fit any density with geometrically reasonable molecules,
but that is not what happens in practice. This is actually quite
remarkable! So many "free parameters" and yet the tug-o-war between R
factors and geometry remains. Now, of course, the real molecules in the
real crystal are simultaneously obeying the laws of chemistry and
generating the diffraction patterns that we see, but I have never found
a way to make a macromolecular model that does both. Neither has
anybody else, of course, but wouldn't it be cool if someone figured out how?
-James Holton
MAD Scientist
On 3/18/2023 6:35 PM, Oganesyan, Vaheh wrote:
Hi Ben,
All copies created by multiplying cell dimensions will act exactly
same as the original one, mathematically exactly. Nick’s approach is
better. Something similar to what Nick said was published around
2002-2003. I was reviewing it. I did not understand then what the
author was trying to achieve and kept thinking about it for few
months. The author split model into 20 each with 5% occupancy. After
refinement he got an ensemble that looked like NMR structures. I’m not
sure, however, that adding that uncertainty will help answering any
question.
Vaheh
*From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> *On Behalf Of
*benjamin bax
*Sent:* Saturday, March 18, 2023 5:07 PM
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] To Trim or Not to To Trim
Hi,
Probably a stupid question.
Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or
27 structures) and restrain well defined parts of structure to be
‘identical’ ? To give you a more NMR like chemically sensible ensemble
of structures?
Ben
> On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote:
>
> Models for crystallography have two purposes: refinement and
interpretation. Here these two purposes are in conflict. Neither case
is handled well by either trim or not trim scenario, but trimming
results in a deficit for refinement and not-trimming results in a
deficit for interpretation.
>
> Our computational tools are not “fixed” in the same way that the
standard amino acids are “fixed” or your government’s bureaucracy
pathways are “fixed”. They are open for debate and for adjustments.
This is a fine example where it may be more productive to discuss the
options for making changes to the model itself or its representation,
to better account for awkward situations such as these. Otherwise we
are left figuring out the best imperfect way to use an imperfect tool
(as all tools are, to varying degrees!), which isn’t satisfying for
enough people, enough of the time.
>
> I now appreciate the hypocrisy in the argument “do not trim, but
also don’t model disordered regions”, even though I’d be keen to avoid
trimming. This discussion has therefore softened my own viewpoint.
>
> My refinement models (as implemented in Vagabond) do away with the
concept of B factors precisely for the anguish it causes here, and
refines a distribution of protein conformations which is sampled to
generate an ensemble. By describing the conformations through the
torsion angles that comprise the protein, modelling flexibility of a
disordered lysine is comparatively trivial, and indeed modelling all
possible conformations of a disordered loop becomes feasible. Lysines
end up looking like a frayed end of a rope. Each conformation can
produce its own solvent mask, which can be summed together to produce
a blurring of density that matches what you would expect to see in the
crystal.
>
> In my experience this doesn’t drop the R factors as much as you’d
assume, because blurred out protein density does look very much like
solvent, but it vastly improves the interpretability of the model.
This also better models the boundary between the atoms you would trim
and those you’d leave untrimmed, by avoiding such a binary
distinction. No fear of trimming and pushing those errors unseen into
the rest of the structure. No fear of leaving atoms in with an
inadequate B factor model that cannot capture the nature of the disorder.
>
> Vagabond is undergoing a heavy rewrite though, and is not yet ready
for human consumption. Its first iteration worked on
single-dataset-single-model refinement, which handled disordered side
chains well enough, with no need to decide to exclude atoms. The heart
of the issue lies in main chain flexibility, and this must be handled
correctly, for reasons of interpretability and elucidating the
biological impact. This model isn’t perfect either, and necessitates
its own compromises - but will provide another tool in the structural
biology arsenal.
>
> —-
>
> Dr Helen Ginn
> Group leader, DESY
> Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR)
> Luruper Chaussee 149
> 22607 Hamburg
>
> ########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
>
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB
<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by
www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
This message was issued to members of www.jiscmail.ac.uk/CCP4BB
<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by
www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/