On Mar 19, 2023, at 2:37 PM, James Holton <jmhol...@lbl.gov> wrote:
They say one test is worth a thousand expert opinions, so I tried
my hand at the former.
The question is: what is the right way to treat disordered side chains?:
a) omit atoms you cannot see
b) build them, and set occupancy to zero
c) build them, and "let the B factors take care of it"
d) none of the above
The answer, of course, is d).
Oh, c'mon. Yes, I know one of a,b, or c is what you've been doing
your whole life. I do it too. But, let's face it: none of these
solutions are perfect. So, the real question is not which one is
"right", but which is the least wrong?
We all know what is really going on: the side chain is flapping
around. No doubt it spends most of its time in energetically
reasonable but nevertheless numerous conformations. There are 41
"Favorable" rotamers for Lys alone, and it doesn't take that many to
spread the density thin enough to fall below the classical 1-sigma
contour level. The atoms are still there, they are still
contributing to the data, and they haven't gone far. So why don't we
"just" model that? Already, I can hear the cries of "over-fitting!"
and "observations/parameters!", "model bias!", and "think of the
children!" Believe it or not, none of these are the major issue
here. Allow me to demonstrate:
Consider a simple case where we have a Lys side chain in ten
conformers. I chose from popular rotamers, but evenly spread. That
is, all 10 conformers have an occupancy of 0.10, and there is a
3-3-4 split of chi1 values between minus, plus and trans. This will
give the maximum contrast of density between CB and CG. Let us
further require that there is no strain in this ground-truth. No
stretched bonds, no tortured angles, no clashes, etc. Real
molecules don't occupy such high-energy states unless they
absolutely have to. Let us further assume that the bulk solvent
works the way phenix models it, which is a probe radius of 1.1 A for
both ions and aliphatics and a shrink radius of 0.9. But, instead of
running one phenix.fmodel job, I ran ten: one for each conformer (A
thru J). To add some excitement, I moved the main chain ~0.2 A in a
random direction for each conformer. I then took these ten
calculated electron density maps (bulk solvent and all) and added
them together to form the ground truth for the following trials.
Before refinement, I added noise consistent with an I/sigma of 50
and cut the resolution at 2.0 A. Wilson B is 50:
CCtrue Rwork% Rfree% fo-fc(sigma) description
0.8943 9.05 10.60 5.9 stump at CB
0.9540 9.29 11.73 6.0 single conformer, zero
occupancy
0.9471 10.35 15.04 5.1 single conformer, full
occupancy, refmac5
0.9523 9.78 15.61 4.9 single conformer, full
occupancy, phenix.refine
So, it would appear that the zero-occupancy choice "wins", but by
the narrowest of margins. Here CCtrue is the Pearson correlation
coefficient between the ground-truth right-answer electron density
and the 2fofc map resulting from the refinement. Rwork and Rfree
are the usual suspects, and fo-fc indicates the tallest peak in the
difference map. Refinement was with refmac unless otherwise
indicated. I think we often forget that both phenix and refmac
restrain B factor values, not just through bonds but through space,
and they use rather different algorithms. Refmac tries to make the
histogram of B factors "look right", whereas phenix allows steeper
gradients. I also ran all 10 correct rotamers separately and picked
the one with the best CCtrue to show above. If you instead sort on
Rfree (which you really shouldn't do), you get different bests, but
they are not much better (as low as 10.5%). So, the winner here
depends on how you score. CCtrue is the best score, but also
unfortunately unavailable for real data.
It is perhaps interesting here that better CCtrue goes along with
worse Rfree. This is not what I usually see in experiments like
this. Rather, what I think is going on here is the system is
frustrated. We are trying to fit various square pegs into a round
hole, and none of them fit all that well.
In all cases here the largest difference peak was indicating another
place to put the Lys, so why not build into that screaming, 6-sigma
difference peak? Here is what happens when you do that:
CCtrue Rwork% Rfree% fo-fc(sigma) description
0.8943 9.05 10.60 5.9 stump at CB
0.9580 9.95 11.60 6.4 stump at CG
0.9585 10.20 12.29 6.2 stump at CG, all 10 confs
0.9543 10.61 12.24 5.3 stump at CD, all 10 confs
0.9383 10.69 14.64 4.1 stump at CE, all 10 confs
0.9476 9.66 13.48 4.6 all atoms, all 10 confs
0.9214 7.09 11.8 5.6 three conformers (worst of 120 combos)
0.9718 6.53 8.55 4.3 three conformers (best of 120 combos)
0.9710 7.17 9.44 6.1 two conformers (best of 45 combos)
0.9471 10.35 15.04 5.1 single conformer (best of 10 choices)
If I add one CG, the other two chi1 positions light up. So, I tried
building in all 10 true CG positions, and let the refinement decide
what to do with them. The clear indication was that a CD should be
added. After adding all the CDs, the difference peaks were weaker,
but still indicating more atoms were needed. Rwork and Rfree,
however, tell the opposite story. They get worse the more atoms you
add. CCtrue, on the other hand, was best when cutting everything
after CG. Why is that? Well, every time you add another atom you
fill in the difference density, but then that atom pushes back the
bulk solvent model that was filling in the density for the next
atom. The atom-to-solvent distance is roughly twice that of a
covalent bond. So again, square pegs and round holes.
Three conformers coming out as the winner may be because it is a
selective process with a noisy score. In the ground truth there are
10 conformers at equal occupancy, so no one triplet is really any
better than any other. However, one has a density shape that fits
better than other combos. My search over all possible quartets is
still running.
But what if we got the solvent "right"? Well, here is what that
looks like:
CCtrue Rwork% Rfree% fo-fc(sigma) description
0.9476 9.66 13.48 4.6 all atoms, all confs, refmac defaults
0.9696 6.15 8.88 3.7 all atoms, all confs, phenix.refine
0.9825 0.80 0.89 3.9 all atoms, all confs, true solvent
0.9824 0.92 1.26 7.3 true model, minus one H atom from
ordered HIS side chain
You can see that the default solvent of phenix.refine fares better
than refmac here, but since I generated the solvent with phenix
refine it may have an unfair advantage. Nevertheless, providing the
"true solvent" here is quite a striking drop in R factors. This is
not surprising since this was the last systematic error in this
ground truth. In all cases, I provided the true atomic positions at
the start of refinement, so there was no confusion about
strain-inducing local minima, such as which rotamer goes with which
main chain shift. And yes, you can provide arbitrary bulk solvent
maps to refmac5 using the "Fpart" feature. I've had good luck with
real data using bulk density derived form MD simulations.
What is more, once the R factors are this low I can remove just one
hydrogen atom and it comes back as a 7.3-sigma difference peak. This
corresponds to the protonation state of that His. This kind of
sensitivity is really attractive if you are looking for low-lying
features, such as partially-occupied ligands. Some may pooh-pooh R
factors as "cosmetic" features of structures, but they are, in fact,
nothing more or less than the % error between your model and your
data. This % error translates directly into the noise level of your
map. At 20% error there is no hope whatsoever of seeing 1-electron
changes. This is because hydrogen is only 17% of a carbon. But 3-5%
error, which is a typical experimental error in crystallographic
data, anything bigger than one electron is clear.
-James Holton
MAD Scientist
On 3/18/2023 2:10 PM, Nicholas Pearce wrote:
Not stupid, but essentially the same as modelling alt confs, though
would probably give more overfitting. Alt confs can easily be
converted to an ensemble (if done properly…).
Thanks,
Nick
———
Nicholas Pearce
Assistant Professor in Bioinformatics & DDLS Fellow
Linköping University
Sweden
------------------------------------------------------------------------
*From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of
benjamin bax <ben.d.v....@gmail.com>
*Sent:* Saturday, March 18, 2023 10:07:26 PM
*To:* CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
*Subject:* Re: [ccp4bb] To Trim or Not to To Trim
Hi,
Probably a stupid question.
Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8
or 27 structures) and restrain well defined parts of structure to
be ‘identical’ ? To give you a more NMR like chemically sensible
ensemble of structures?
Ben
> On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote:
>
> Models for crystallography have two purposes: refinement and
interpretation. Here these two purposes are in conflict. Neither
case is handled well by either trim or not trim scenario, but
trimming results in a deficit for refinement and not-trimming
results in a deficit for interpretation.
>
> Our computational tools are not “fixed” in the same way that the
standard amino acids are “fixed” or your government’s bureaucracy
pathways are “fixed”. They are open for debate and for adjustments.
This is a fine example where it may be more productive to discuss
the options for making changes to the model itself or its
representation, to better account for awkward situations such as
these. Otherwise we are left figuring out the best imperfect way to
use an imperfect tool (as all tools are, to varying degrees!),
which isn’t satisfying for enough people, enough of the time.
>
> I now appreciate the hypocrisy in the argument “do not trim, but
also don’t model disordered regions”, even though I’d be keen to
avoid trimming. This discussion has therefore softened my own
viewpoint.
>
> My refinement models (as implemented in Vagabond) do away with
the concept of B factors precisely for the anguish it causes here,
and refines a distribution of protein conformations which is
sampled to generate an ensemble. By describing the conformations
through the torsion angles that comprise the protein, modelling
flexibility of a disordered lysine is comparatively trivial, and
indeed modelling all possible conformations of a disordered loop
becomes feasible. Lysines end up looking like a frayed end of a
rope. Each conformation can produce its own solvent mask, which can
be summed together to produce a blurring of density that matches
what you would expect to see in the crystal.
>
> In my experience this doesn’t drop the R factors as much as you’d
assume, because blurred out protein density does look very much
like solvent, but it vastly improves the interpretability of the
model. This also better models the boundary between the atoms you
would trim and those you’d leave untrimmed, by avoiding such a
binary distinction. No fear of trimming and pushing those errors
unseen into the rest of the structure. No fear of leaving atoms in
with an inadequate B factor model that cannot capture the nature of
the disorder.
>
> Vagabond is undergoing a heavy rewrite though, and is not yet
ready for human consumption. Its first iteration worked on
single-dataset-single-model refinement, which handled disordered
side chains well enough, with no need to decide to exclude atoms.
The heart of the issue lies in main chain flexibility, and this
must be handled correctly, for reasons of interpretability and
elucidating the biological impact. This model isn’t perfect either,
and necessitates its own compromises - but will provide another
tool in the structural biology arsenal.
>
> —-
>
> Dr Helen Ginn
> Group leader, DESY
> Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR)
> Luruper Chaussee 149
> 22607 Hamburg
>
>
########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
>
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0>
>
> This message was issued to members of
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0>,
a mailing list hosted by
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0>,
terms & conditions are available at
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0>
This message was issued to members of
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0>,
a mailing list hosted by
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0>,
terms & conditions are available at
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0>
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>