[ccp4bb] Validation of structure prediction

dusan turk Mon, 17 Jan 2022 16:39:36 -0800

Hi guys,

Maybe this may address a few of your questions. In 2019 we published a method 
to validate structures unbiased from the terms used in refinement and works on 
predicted models too.


Pražnikar, J., Tomić, M. & Turk, D. Validation and quality assessment of 
macromolecular structures using complex network analysis. Sci Rep 9, 1678 
(2019). https://doi.org/10.1038/s41598-019-38658-9 
<https://doi.org/10.1038/s41598-019-38658-9>

???

best, dusan



> On 18 Jan 2022, at 01:00, CCP4BB automatic digest system 
> <lists...@jiscmail.ac.uk> wrote:
> 
> There are 5 messages totaling 1446 lines in this issue.
> 
> Topics of the day:
> 
>  1. Validation of structure prediction (2)
>  2. Improved support for extended PDBx/mmCIF structure factor files (2)
>  3. Structural Biology Cryo-EM TT Asst. Professor at University of Nebraska
> 
> ########################################################################
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/
> 
> ----------------------------------------------------------------------
> 
> Date:    Mon, 17 Jan 2022 09:39:33 +0100
> From:    Jan Dohnalek <dohnalek...@gmail.com>
> Subject: Re: Validation of structure prediction
> 
> I think quite a bit of this "inconsistency" with protein structures comes
> from the fact that with our larger globules it is much more true that our
> model is an approximate time and space average of something that could have
> the ideal geometry.
> I.e. the way we are trying to represent the density is actually not that
> appropriate. The only "improvement" to this I think is the multiple model
> approach.
> 
> My 2 c.
> 
> Jan
> 
> 
> On Sat, Jan 15, 2022 at 9:29 PM James Holton <jmhol...@lbl.gov> wrote:
> 
>> 
>> On 1/13/2022 11:14 AM, Tristan Croll wrote:
>> 
>> (please don’t actually do this)
>> 
>> 
>> Too late!  I've been doing that for years.  What happens, of course, is
>> the "geometry" improves, but the R factors go through the roof.  This I
>> expect comes as no surprise to anyone who has played with the "weight"
>> parameters in refinement, but maybe it should?  What is it about our
>> knowledge of chemical bond lengths, angles, and radii that is inconsistent
>> with the electron density of macromolecules, but not small molecules?  Why
>> do macro-models have a burning desire to leap away from the configuration
>> we know they adopt in reality?  If you zoom in on those "bad clashes"
>> individually, they don't look like something that is supposed to happen.
>> There is a LOT of energy stored up in those little springs.  I have a hard
>> time thinking that's for real. The molecule is no doubt doing something
>> else and we're just not capturing it properly.  There is information to be
>> had here, a lot of information.
>> 
>> This is why I too am looking for an all-encompassing "geometry score".
>> Right now I'm multiplying other scores together:
>> 
>> score = (1+Clashscore)*sin(worst_omega)*1./(1+worst_rama)*1/(1+worst_rota)
>> 
>> *Cbetadev*worst_nonbond*worst_bond*worst_angle*worst_dihedral*worst_chir*worst_plane
>> 
>> where things like worst_rama is the "%score" given to the worst
>> Ramachandran angle by phenix.ramalyze, and worst_bond is the largest
>> "residual" reported among all the bonds in the structure by molprobity or
>> phenix.geometry_minimization.  For "worst_nonbond" I'm plugging the
>> observed and ideal distances into a Leonard-Jones6-12 potential to convert
>> it into an "energy" that is always positive.
>> 
>> With x-ray data in hand, I've been multiplying this whole thing by Rwork
>> and trying to find clever ways to minimize the product.  Rfree is then, as
>> always, the cross-check.
>> 
>> Or does someone have a better idea?
>> 
>> -James Holton
>> MAD Scientist
>> 
>> 
>> On 1/13/2022 11:14 AM, Tristan Croll wrote:
>> 
>> Hard but not impossible - even when you *are* fitting to low-res density.
>> See https://twitter.com/crolltristan/status/1381258326223290373?s=21 for
>> example - no Ramachandran outliers, 1.3% sidechain outliers, clashscore of
>> 2... yet multiple regions out of register by anywhere up to 15 residues! I
>> never publicly named the structure (although I did share my rebuilt model
>> with the authors), but the videos and images in that thread should be
>> enough to illustrate the scale of the problem.
>> 
>> And that was *with* a map to fit! Take away the map, and run some MD
>> energy minimisation (perhaps with added Ramachandran and rotamer
>> restraints), and I think it would be easy to get your model to fool most
>> “simple” validation metrics (please don’t actually do this). The upshot is
>> that I still think validation of predicted models in the absence of at
>> least moderate-resolution experimental data is still a major challenge
>> requiring very careful thought.
>> 
>> — Tristan
>> 
>> On 13 Jan 2022, at 18:41, James Holton <jmhol...@lbl.gov> wrote:
>> 
>> Agree with Pavel.
>> 
>> Something I think worth adding is a reminder that the MolProbity score
>> only looks at bad clashes, ramachandran and rotamer outliers.
>> 
>> 
>> MPscore=0.426∗ln(1+clashscore)+0.33∗ln(1+max(0,rota_out−1))+0.25∗ln(1+max(0,rama_iffy−2))+0.5
>> 
>> It pays no attention whatsoever to twisted peptide bonds, C-beta
>> deviations, and, for that matter, bond lengths and bond angles. If you
>> tweak your weights right you can get excellent MP scores, but horrible
>> "geometry" in the traditional bonds-and-angles sense. The logic behind this
>> kind of validation is that normally nonbonds and torsions are much softer
>> than bond and angle restraints and therefore fertile ground for detecting
>> problems.  Thus far, I am not aware of any "Grand Unified Score" that
>> combines all geometric considerations, but perhaps it is time for one?
>> 
>> Tristan's trivial solution aside, it is actually very hard to make all the
>> "geometry" ideal for a real-world fold, and especially difficult to do
>> without also screwing up the agreement with density (R factor).  I would
>> argue that if you don't have an R factor then you should get one, but I am
>> interested in opinions about alternatives.
>> 
>> I.E. What if we could train an AI to predict Rfree by looking at the
>> coordinates?
>> 
>> -James Holton
>> MAD Scientist
>> 
>> On 12/21/2021 9:25 AM, Pavel Afonine wrote:
>> 
>> Hi Reza,
>> 
>> If you think about it this way... Validation is making sure that the model
>> makes sense, data make sense and model-to-data fit make sense, then the
>> answer to your question is obvious: in your case you do not have
>> experimental data (at least in a way we used to think of it) and so then of
>> these three validation items you only have one, which, for example, means
>> you don’t have to report things like R-factors or completeness in
>> high-resolution shell.
>> 
>> Really, the geometry of an alpha helix does not depend on how you
>> determined it: using X-rays or cryo-EM or something else! So, most (if not
>> all) model validation tools still apply.
>> 
>> Pavel
>> 
>> On Mon, Dec 20, 2021 at 8:10 AM Reza Khayat <rkha...@ccny.cuny.edu> wrote:
>> 
>>> Hi,
>>> 
>>> 
>>> Can anyone suggest how to validate a predicted structure? Something
>>> similar to wwPDB validation without the need for refinement statistics. I
>>> realize this is a strange question given that the geometry of the model is
>>> anticipated to be fine if the structure was predicted by a server that
>>> minimizes the geometry to improve its statistics. Nonetheless, the journal
>>> has asked me for such a report. Thanks.
>>> 
>>> 
>>> Best wishes,
>>> 
>>> Reza
>>> 
>>> 
>>> Reza Khayat, PhD
>>> Associate Professor
>>> City College of New York
>>> Department of Chemistry and Biochemistry
>>> New York, NY 10031
>>> 
>>> ------------------------------
>>> 
>>> To unsubscribe from the CCP4BB list, click the following link:
>>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>>> 
>> 
>> ------------------------------
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>> 
>> 
>> 
>> ------------------------------
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>> 
>> 
>> 
>> ------------------------------
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>> 
> 
> 
> -- 
> Jan Dohnalek, Ph.D
> Institute of Biotechnology
> Academy of Sciences of the Czech Republic
> Biocev
> Prumyslova 595
> 252 50 Vestec near Prague
> Czech Republic
> 
> Tel. +420 325 873 758
> 
> ########################################################################
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/
> 
> ------------------------------
> 
> Date:    Mon, 17 Jan 2022 09:56:14 +0100
> From:    "CAVARELLI Jean (VIE)" <jean.cavare...@unistra.fr>
> Subject: Re: Validation of structure prediction
> 
> 
> May be useful if not already mentioned 
> 
> Ten things I `hate' about refinement 
> Pietro Roversi and Dale E. Tronrud 
> https://journals.iucr.org/d/issues/2021/12/00/qt5008/index.html 
> 
> ------------------------- 
> Jean Cavarelli 
> Professor of Structural Biology 
> "Structural biology of epigenetic targets" 
> Department of Integrated structural biology 
> IGBMC,UMR7104 CNRS-UNISTRA, INSERM U 1258 
> phone : +33 (0)3 69 48 52 74 
> 
> 
> De: "Jan Dohnalek" <dohnalek...@gmail.com> 
> À: "ccp4bb" <CCP4BB@JISCMAIL.AC.UK> 
> Envoyé: Lundi 17 Janvier 2022 09:39:33 
> Objet: Re: [ccp4bb] Validation of structure prediction 
> 
> I think quite a bit of this "inconsistency" with protein structures comes 
> from the fact that with our larger globules it is much more true that our 
> model is an approximate time and space average of something that could have 
> the ideal geometry. 
> I.e. the way we are trying to represent the density is actually not that 
> appropriate. The only "improvement" to this I think is the multiple model 
> approach. 
> 
> My 2 c. 
> 
> Jan 
> 
> 
> On Sat, Jan 15, 2022 at 9:29 PM James Holton < [ mailto:jmhol...@lbl.gov | 
> jmhol...@lbl.gov ] > wrote: 
> 
> 
> 
> 
> On 1/13/2022 11:14 AM, Tristan Croll wrote: 
> 
> BQ_BEGIN
> (please don’t actually do this) 
> 
> 
> Too late! I've been doing that for years. What happens, of course, is the 
> "geometry" improves, but the R factors go through the roof. This I expect 
> comes as no surprise to anyone who has played with the "weight" parameters in 
> refinement, but maybe it should? What is it about our knowledge of chemical 
> bond lengths, angles, and radii that is inconsistent with the electron 
> density of macromolecules, but not small molecules? Why do macro-models have 
> a burning desire to leap away from the configuration we know they adopt in 
> reality? If you zoom in on those "bad clashes" individually, they don't look 
> like something that is supposed to happen. There is a LOT of energy stored up 
> in those little springs. I have a hard time thinking that's for real. The 
> molecule is no doubt doing something else and we're just not capturing it 
> properly. There is information to be had here, a lot of information. 
> 
> This is why I too am looking for an all-encompassing "geometry score". Right 
> now I'm multiplying other scores together: 
> 
> score = (1+Clashscore)*sin(worst_omega)*1./(1+worst_rama)*1/(1+worst_rota) 
> *Cbetadev*worst_nonbond*worst_bond*worst_angle*worst_dihedral*worst_chir*worst_plane
>  
> 
> where things like worst_rama is the "%score" given to the worst Ramachandran 
> angle by phenix.ramalyze, and worst_bond is the largest "residual" reported 
> among all the bonds in the structure by molprobity or 
> phenix.geometry_minimization. For "worst_nonbond" I'm plugging the observed 
> and ideal distances into a Leonard-Jones6-12 potential to convert it into an 
> "energy" that is always positive. 
> 
> With x-ray data in hand, I've been multiplying this whole thing by Rwork and 
> trying to find clever ways to minimize the product. Rfree is then, as always, 
> the cross-check. 
> 
> Or does someone have a better idea? 
> 
> -James Holton 
> MAD Scientist 
> 
> 
> On 1/13/2022 11:14 AM, Tristan Croll wrote: 
> 
> BQ_BEGIN
> 
> Hard but not impossible - even when you *are* fitting to low-res density. See 
> [ https://twitter.com/crolltristan/status/1381258326223290373?s=21 | 
> https://twitter.com/crolltristan/status/1381258326223290373?s=21 ] for 
> example - no Ramachandran outliers, 1.3% sidechain outliers, clashscore of 
> 2... yet multiple regions out of register by anywhere up to 15 residues! I 
> never publicly named the structure (although I did share my rebuilt model 
> with the authors), but the videos and images in that thread should be enough 
> to illustrate the scale of the problem. 
> 
> And that was *with* a map to fit! Take away the map, and run some MD energy 
> minimisation (perhaps with added Ramachandran and rotamer restraints), and I 
> think it would be easy to get your model to fool most “simple” validation 
> metrics (please don’t actually do this). The upshot is that I still think 
> validation of predicted models in the absence of at least moderate-resolution 
> experimental data is still a major challenge requiring very careful thought. 
> 
> — Tristan 
> 
> On 13 Jan 2022, at 18:41, James Holton < [ mailto:jmhol...@lbl.gov | 
> jmhol...@lbl.gov ] > wrote: 
> 
> 
> BQ_BEGIN
> 
> Agree with Pavel. 
> 
> Something I think worth adding is a reminder that the MolProbity score only 
> looks at bad clashes, ramachandran and rotamer outliers. 
> 
> MPscore=0.426∗ln(1+clashscore)+0.33∗ln(1+max(0,rota_out−1))+0.25∗ln(1+max(0,rama_iffy−2))+0.5
>  
> 
> It pays no attention whatsoever to twisted peptide bonds, C-beta deviations, 
> and, for that matter, bond lengths and bond angles. If you tweak your weights 
> right you can get excellent MP scores, but horrible "geometry" in the 
> traditional bonds-and-angles sense. The logic behind this kind of validation 
> is that normally nonbonds and torsions are much softer than bond and angle 
> restraints and therefore fertile ground for detecting problems. Thus far, I 
> am not aware of any "Grand Unified Score" that combines all geometric 
> considerations, but perhaps it is time for one? 
> 
> Tristan's trivial solution aside, it is actually very hard to make all the 
> "geometry" ideal for a real-world fold, and especially difficult to do 
> without also screwing up the agreement with density (R factor). I would argue 
> that if you don't have an R factor then you should get one, but I am 
> interested in opinions about alternatives. 
> 
> I.E. What if we could train an AI to predict Rfree by looking at the 
> coordinates? 
> 
> -James Holton 
> MAD Scientist 
> 
> On 12/21/2021 9:25 AM, Pavel Afonine wrote: 
> 
> BQ_BEGIN
> 
> Hi Reza, 
> 
> 
> If you think about it this way... Validation is making sure that the model 
> makes sense, data make sense and model-to-data fit make sense, then the 
> answer to your question is obvious: in your case you do not have experimental 
> data (at least in a way we used to think of it) and so then of these three 
> validation items you only have one, which, for example, means you don’t have 
> to report things like R-factors or completeness in high-resolution shell. 
> 
> Really, the geometry of an alpha helix does not depend on how you determined 
> it: using X-rays or cryo-EM or something else! So, most (if not all) model 
> validation tools still apply. 
> 
> Pavel 
> 
> On Mon, Dec 20, 2021 at 8:10 AM Reza Khayat < [ mailto:rkha...@ccny.cuny.edu 
> | rkha...@ccny.cuny.edu ] > wrote: 
> 
> BQ_BEGIN
> 
> 
> 
> Hi, 
> 
> 
> 
> Can anyone suggest how to validate a predicted structure? Something similar 
> to wwPDB validation without the need for refinement statistics. I realize 
> this is a strange question given that the geometry of the model is 
> anticipated to be fine if the structure was predicted by a server that 
> minimizes the geometry to improve its statistics. Nonetheless, the journal 
> has asked me for such a report. Thanks. 
> 
> 
> 
> Best wishes, 
> 
> 
> Reza 
> 
> 
> 
> Reza Khayat, PhD 
> Associate Professor 
> City College of New York 
> Department of Chemistry and Biochemistry 
> New York, NY 10031 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link: 
> [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | 
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] 
> BQ_END
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link: 
> [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | 
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] 
> BQ_END
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link: 
> [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | 
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] 
> 
> BQ_END
> 
> BQ_END
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link: 
> [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | 
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] 
> BQ_END
> 
> 
> 
> -- 
> Jan Dohnalek, Ph.D 
> Institute of Biotechnology 
> Academy of Sciences of the Czech Republic 
> Biocev 
> Prumyslova 595 
> 252 50 Vestec near Prague 
> Czech Republic 
> 
> Tel. +420 325 873 758 
> 
> 


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

[ccp4bb] Validation of structure prediction

Reply via email to