Greetings Everybody,
I present to you a Challenge.
Structural biology would be far more powerful if we can get our models
out of local minima, and together, I believe we can find a way to escape
them.
tldr: I dare any one of you to build a model that scores better than my
"best.pdb" model below. That is probably impossible, so I also dare you
to approach or even match "best.pdb" by doing something more clever than
just copying it. Difficulty levels range from 0 to 11. First one to
match the best.pdb energy score an Rfree wins the challenge, and I'd
like you to be on my paper. You have nine months.
Details of the challenge, scoring system, test data, and available
starting points can be found here:
https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/
Why am I doing this?
We all know that macromolecules adopt multiple conformations. That is
how they function. And yet, ensemble refinement still has a hard time
competing with conventional single-conformer-with-a-few-split-side-chain
models when it comes to revealing correlated motions, or even just
simultaneously satisfying density data and chemical restraints. That is,
ensembles still suffer from the battle between R factors and geometry
restraints. This is because the ensemble member chains cannot pass
through each other, and get tangled. The tangling comes from the
density, not the chemistry. Refinement in refmac, shelxl, phenix,
simulated annealing, qFit, and even coot cannot untangle them.
The good news is: knowledge of chemistry, combined with R factors,
appears to be a powerful indicator of how near a model is to being
untangled. What is really exciting is that the genuine, underlying
ensemble cannot be tangled. The true ensemble _defines_ the density; it
is not being fit to it. The more untangled a model gets the closer it
comes to the true ensemble, with deviations from reasonable chemistry
becoming easier and easier to detect. In the end, when all alternative
hypotheses have been eliminated, the model must match the truth.
Why can't we do this with real data? Because all ensemble models are
tangled. Let's get to untangling them, shall we?
To demonstrate, I have created a series of examples that are
progressively more difficult to solve, but the ground truth model and
density is the same in all cases. Build the right model, and it will not
only explain the data to within experimental error, and have the best
possible validation stats, but it will reveal the true, underlying
cooperative motion of the protein as well.
Unless, of course, you can prove me wrong?
-James Holton
MAD Scientist
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/