Update:
I've gotten some feedback asking for clarity on what I mean by
"tangled". I paste here a visual aid:
The protein chains in an ensemble model are like these ropes. If these
ropes are the same length as the distance from floor to ceiling, then
straight up-and-down is the global minimum in energy (left). The anchor
points are analogous to the rest of the protein structure, which is the
same in both diagrams. Imagine for a moment, however, after anchoring
the dangling rope ends to the floor you look up and see the ropes are
actually crossed (right). You got the end points right, but no amount of
pulling on the ropes (energy minimization) is going to get you from the
tangled structure to the global minimum. The tangled ropes are also
strained, because they are being forced to be a little longer than they
want to be. This strain in protein models manifests as geometry outliers
and the automatic weighting in your refinement program responds to bad
geometry by relaxing the x-ray weight, which alleviates some of the
strain, but increases your Rfree.
The goal of this challenge is to eliminate these tangles, and do it
efficiently. What we need is a topoisomerase! Something that can find
the source of strain and let the ropes pass through each other at the
appropriate place. I've always wanted one of those for the wires behind
my desk...
More details on the origins of tangling in ensemble models can be found
here:
https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/#tangle
-James Holton
MAD Scientist
On 1/18/2024 4:33 PM, James Holton wrote:
Greetings Everybody,
I present to you a Challenge.
Structural biology would be far more powerful if we can get our models
out of local minima, and together, I believe we can find a way to
escape them.
tldr: I dare any one of you to build a model that scores better than
my "best.pdb" model below. That is probably impossible, so I also dare
you to approach or even match "best.pdb" by doing something more
clever than just copying it. Difficulty levels range from 0 to 11.
First one to match the best.pdb energy score an Rfree wins the
challenge, and I'd like you to be on my paper. You have nine months.
Details of the challenge, scoring system, test data, and available
starting points can be found here:
https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/
Why am I doing this?
We all know that macromolecules adopt multiple conformations. That is
how they function. And yet, ensemble refinement still has a hard time
competing with conventional
single-conformer-with-a-few-split-side-chain models when it comes to
revealing correlated motions, or even just simultaneously satisfying
density data and chemical restraints. That is, ensembles still suffer
from the battle between R factors and geometry restraints. This is
because the ensemble member chains cannot pass through each other, and
get tangled. The tangling comes from the density, not the chemistry.
Refinement in refmac, shelxl, phenix, simulated annealing, qFit, and
even coot cannot untangle them.
The good news is: knowledge of chemistry, combined with R factors,
appears to be a powerful indicator of how near a model is to being
untangled. What is really exciting is that the genuine, underlying
ensemble cannot be tangled. The true ensemble _defines_ the density;
it is not being fit to it. The more untangled a model gets the closer
it comes to the true ensemble, with deviations from reasonable
chemistry becoming easier and easier to detect. In the end, when all
alternative hypotheses have been eliminated, the model must match the
truth.
Why can't we do this with real data? Because all ensemble models are
tangled. Let's get to untangling them, shall we?
To demonstrate, I have created a series of examples that are
progressively more difficult to solve, but the ground truth model and
density is the same in all cases. Build the right model, and it will
not only explain the data to within experimental error, and have the
best possible validation stats, but it will reveal the true,
underlying cooperative motion of the protein as well.
Unless, of course, you can prove me wrong?
-James Holton
MAD Scientist
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/