Re: [ccp4bb] Introducing the UNTANGLE Challenge

James Holton Tue, 01 Oct 2024 07:34:16 -0700

OK Folks, it has been nine months since I announced this challenge. Someexcellent suggestions have been made, and some new tools created forthis community, but the UNTANGLE Challenge is still not solved!


As additional incentive, I am now officially announcing cash prizes:
$1000 (USD) for the first/best solution to level 11.
$500 (USD) for the first/best solution to level 10.

The instructions, rules and data files otherwise remain the same:
https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/

No cheating! (by that I mean, I'm not sending you $500 for justemailing me a copy of the provided best.pdb file. You have to show howyou got there from the density).

Some good news: If you download the latest Phenix Suite, there are newtools that will help you:

% phenix.holton_geometry_validation

Calculates my wE score and is generally applicable beyond thischallenge data (no, I did not name it, thanks Tom Terwilliger)


% phenix.create_alt_conf
    Makes and optimizes alternate conformer choices (can use a lot of CPU)

% phenix.refine new features: fit_altlocs_method=masking ,include_altlocs=True and refine_oat=True for improved refinement results when using alt confs in protein andin solvent


One recent caveat:

As of Phenix version 1.21.2 (build 5419 and later) the idealnon-bond distances for potential hydrogen bonds have been updated. Thiswill make my "weighted Energy" (wE) geometry scores worse than thosefrom earlier versions. So, in fairness to those who put a lot of effortin so far, I will allow wE scores computed with earlier versions ofphenix, even if they are built or refined with other programs.

Yes, I know this is the CCP4BB, but I have not had any reports of newdevelopments on this front from the CCP4 team. Feel free to chime in here.

Thank you all who have tried your hand and made great new tools so far! The path to the ideal, underlying ensemble is clearly a difficult one,but it must exist. And it is a journey worth making if it revealscooperative motions like those posited in the ground truth of thisChallenge. Easily worth $1500.


-James Holton
MAD Scientsit



On 1/21/2024 7:07 AM, Herbert J. Bernstein wrote:

Have you considered the impact of tunneling? Your rope crossings arenot perfect barriers.


On Sat, Jan 20, 2024 at 6:09 PM James Holton <jmhol...@lbl.gov> wrote:

    Update:

    I've gotten some feedback asking for clarity on what I mean by
    "tangled". I paste here a visual aid:


    The protein chains in an ensemble model are like these ropes. If
    these ropes are the same length as the distance from floor to
    ceiling, then straight up-and-down is the global minimum in energy
    (left). The anchor points are analogous to the rest of the protein
    structure, which is the same in both diagrams. Imagine for a
    moment, however, after anchoring the dangling rope ends to the
    floor you look up and see the ropes are actually crossed (right).
    You got the end points right, but no amount of pulling on the
    ropes (energy minimization) is going to get you from the tangled
    structure to the global minimum. The tangled ropes are also
    strained, because they are being forced to be a little longer than
    they want to be. This strain in protein models manifests as
    geometry outliers and the automatic weighting in your refinement
    program responds to bad geometry by relaxing the x-ray weight,
    which alleviates some of the strain, but increases your Rfree.

    The goal of this challenge is to eliminate these tangles, and do
    it efficiently. What we need is a topoisomerase! Something that
    can find the source of strain and let the ropes pass through each
    other at the appropriate place. I've always wanted one of those
    for the wires behind my desk...

    More details on the origins of tangling in ensemble models can be
    found here:
    https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/#tangle

    -James Holton
    MAD Scientist

    On 1/18/2024 4:33 PM, James Holton wrote:

    Greetings Everybody,

    I present to you a Challenge.

    Structural biology would be far more powerful if we can get our
    models out of local minima, and together, I believe we can find a
    way to escape them.

    tldr: I dare any one of you to build a model that scores better
    than my "best.pdb" model below. That is probably impossible, so I
    also dare you to approach or even match "best.pdb" by doing
    something more clever than just copying it. Difficulty levels
    range from 0 to 11. First one to match the best.pdb energy score
    an Rfree wins the challenge, and I'd like you to be on my paper.
    You have nine months.

    Details of the challenge, scoring system, test data, and
    available starting points can be found here:
    https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/

    Why am I doing this?
    We all know that macromolecules adopt multiple conformations.
    That is how they function. And yet, ensemble refinement still has
    a hard time competing with conventional
    single-conformer-with-a-few-split-side-chain models when it comes
    to revealing correlated motions, or even just simultaneously
    satisfying density data and chemical restraints. That is,
    ensembles still suffer from the battle between R factors and
    geometry restraints. This is because the ensemble member chains
    cannot pass through each other, and get tangled. The tangling
    comes from the density, not the chemistry. Refinement in refmac,
    shelxl, phenix, simulated annealing, qFit, and even coot cannot
    untangle them.

    The good news is: knowledge of chemistry, combined with R
    factors, appears to be a powerful indicator of how near a model
    is to being untangled. What is really exciting is that the
    genuine, underlying ensemble cannot be tangled. The true ensemble
    _defines_ the density; it is not being fit to it. The more
    untangled a model gets the closer it comes to the true ensemble,
    with deviations from reasonable chemistry becoming easier and
    easier to detect. In the end, when all alternative hypotheses
    have been eliminated, the model must match the truth.

    Why can't we do this with real data? Because all ensemble models
    are tangled. Let's get to untangling them, shall we?

    To demonstrate, I have created a series of examples that are
    progressively more difficult to solve, but the ground truth model
    and density is the same in all cases. Build the right model, and
    it will not only explain the data to within experimental error,
    and have the best possible validation stats, but it will reveal
    the true, underlying cooperative motion of the protein as well.

    Unless, of course, you can prove me wrong?

    -James Holton
    MAD Scientist



    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Introducing the UNTANGLE Challenge

Reply via email to