Hi guys, Maybe this may address a few of your questions. In 2019 we published a method to validate structures unbiased from the terms used in refinement and works on predicted models too.
Pražnikar, J., Tomić, M. & Turk, D. Validation and quality assessment of macromolecular structures using complex network analysis. Sci Rep 9, 1678 (2019). https://doi.org/10.1038/s41598-019-38658-9 <https://doi.org/10.1038/s41598-019-38658-9> ??? best, dusan > On 18 Jan 2022, at 01:00, CCP4BB automatic digest system > <lists...@jiscmail.ac.uk> wrote: > > There are 5 messages totaling 1446 lines in this issue. > > Topics of the day: > > 1. Validation of structure prediction (2) > 2. Improved support for extended PDBx/mmCIF structure factor files (2) > 3. Structural Biology Cryo-EM TT Asst. Professor at University of Nebraska > > ######################################################################## > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing > list hosted by www.jiscmail.ac.uk, terms & conditions are available at > https://www.jiscmail.ac.uk/policyandsecurity/ > > ---------------------------------------------------------------------- > > Date: Mon, 17 Jan 2022 09:39:33 +0100 > From: Jan Dohnalek <dohnalek...@gmail.com> > Subject: Re: Validation of structure prediction > > I think quite a bit of this "inconsistency" with protein structures comes > from the fact that with our larger globules it is much more true that our > model is an approximate time and space average of something that could have > the ideal geometry. > I.e. the way we are trying to represent the density is actually not that > appropriate. The only "improvement" to this I think is the multiple model > approach. > > My 2 c. > > Jan > > > On Sat, Jan 15, 2022 at 9:29 PM James Holton <jmhol...@lbl.gov> wrote: > >> >> On 1/13/2022 11:14 AM, Tristan Croll wrote: >> >> (please don’t actually do this) >> >> >> Too late! I've been doing that for years. What happens, of course, is >> the "geometry" improves, but the R factors go through the roof. This I >> expect comes as no surprise to anyone who has played with the "weight" >> parameters in refinement, but maybe it should? What is it about our >> knowledge of chemical bond lengths, angles, and radii that is inconsistent >> with the electron density of macromolecules, but not small molecules? Why >> do macro-models have a burning desire to leap away from the configuration >> we know they adopt in reality? If you zoom in on those "bad clashes" >> individually, they don't look like something that is supposed to happen. >> There is a LOT of energy stored up in those little springs. I have a hard >> time thinking that's for real. The molecule is no doubt doing something >> else and we're just not capturing it properly. There is information to be >> had here, a lot of information. >> >> This is why I too am looking for an all-encompassing "geometry score". >> Right now I'm multiplying other scores together: >> >> score = (1+Clashscore)*sin(worst_omega)*1./(1+worst_rama)*1/(1+worst_rota) >> >> *Cbetadev*worst_nonbond*worst_bond*worst_angle*worst_dihedral*worst_chir*worst_plane >> >> where things like worst_rama is the "%score" given to the worst >> Ramachandran angle by phenix.ramalyze, and worst_bond is the largest >> "residual" reported among all the bonds in the structure by molprobity or >> phenix.geometry_minimization. For "worst_nonbond" I'm plugging the >> observed and ideal distances into a Leonard-Jones6-12 potential to convert >> it into an "energy" that is always positive. >> >> With x-ray data in hand, I've been multiplying this whole thing by Rwork >> and trying to find clever ways to minimize the product. Rfree is then, as >> always, the cross-check. >> >> Or does someone have a better idea? >> >> -James Holton >> MAD Scientist >> >> >> On 1/13/2022 11:14 AM, Tristan Croll wrote: >> >> Hard but not impossible - even when you *are* fitting to low-res density. >> See https://twitter.com/crolltristan/status/1381258326223290373?s=21 for >> example - no Ramachandran outliers, 1.3% sidechain outliers, clashscore of >> 2... yet multiple regions out of register by anywhere up to 15 residues! I >> never publicly named the structure (although I did share my rebuilt model >> with the authors), but the videos and images in that thread should be >> enough to illustrate the scale of the problem. >> >> And that was *with* a map to fit! Take away the map, and run some MD >> energy minimisation (perhaps with added Ramachandran and rotamer >> restraints), and I think it would be easy to get your model to fool most >> “simple” validation metrics (please don’t actually do this). The upshot is >> that I still think validation of predicted models in the absence of at >> least moderate-resolution experimental data is still a major challenge >> requiring very careful thought. >> >> — Tristan >> >> On 13 Jan 2022, at 18:41, James Holton <jmhol...@lbl.gov> wrote: >> >> Agree with Pavel. >> >> Something I think worth adding is a reminder that the MolProbity score >> only looks at bad clashes, ramachandran and rotamer outliers. >> >> >> MPscore=0.426∗ln(1+clashscore)+0.33∗ln(1+max(0,rota_out−1))+0.25∗ln(1+max(0,rama_iffy−2))+0.5 >> >> It pays no attention whatsoever to twisted peptide bonds, C-beta >> deviations, and, for that matter, bond lengths and bond angles. If you >> tweak your weights right you can get excellent MP scores, but horrible >> "geometry" in the traditional bonds-and-angles sense. The logic behind this >> kind of validation is that normally nonbonds and torsions are much softer >> than bond and angle restraints and therefore fertile ground for detecting >> problems. Thus far, I am not aware of any "Grand Unified Score" that >> combines all geometric considerations, but perhaps it is time for one? >> >> Tristan's trivial solution aside, it is actually very hard to make all the >> "geometry" ideal for a real-world fold, and especially difficult to do >> without also screwing up the agreement with density (R factor). I would >> argue that if you don't have an R factor then you should get one, but I am >> interested in opinions about alternatives. >> >> I.E. What if we could train an AI to predict Rfree by looking at the >> coordinates? >> >> -James Holton >> MAD Scientist >> >> On 12/21/2021 9:25 AM, Pavel Afonine wrote: >> >> Hi Reza, >> >> If you think about it this way... Validation is making sure that the model >> makes sense, data make sense and model-to-data fit make sense, then the >> answer to your question is obvious: in your case you do not have >> experimental data (at least in a way we used to think of it) and so then of >> these three validation items you only have one, which, for example, means >> you don’t have to report things like R-factors or completeness in >> high-resolution shell. >> >> Really, the geometry of an alpha helix does not depend on how you >> determined it: using X-rays or cryo-EM or something else! So, most (if not >> all) model validation tools still apply. >> >> Pavel >> >> On Mon, Dec 20, 2021 at 8:10 AM Reza Khayat <rkha...@ccny.cuny.edu> wrote: >> >>> Hi, >>> >>> >>> Can anyone suggest how to validate a predicted structure? Something >>> similar to wwPDB validation without the need for refinement statistics. I >>> realize this is a strange question given that the geometry of the model is >>> anticipated to be fine if the structure was predicted by a server that >>> minimizes the geometry to improve its statistics. Nonetheless, the journal >>> has asked me for such a report. Thanks. >>> >>> >>> Best wishes, >>> >>> Reza >>> >>> >>> Reza Khayat, PhD >>> Associate Professor >>> City College of New York >>> Department of Chemistry and Biochemistry >>> New York, NY 10031 >>> >>> ------------------------------ >>> >>> To unsubscribe from the CCP4BB list, click the following link: >>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >>> >> >> ------------------------------ >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> >> >> >> ------------------------------ >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> >> >> >> ------------------------------ >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> > > > -- > Jan Dohnalek, Ph.D > Institute of Biotechnology > Academy of Sciences of the Czech Republic > Biocev > Prumyslova 595 > 252 50 Vestec near Prague > Czech Republic > > Tel. +420 325 873 758 > > ######################################################################## > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing > list hosted by www.jiscmail.ac.uk, terms & conditions are available at > https://www.jiscmail.ac.uk/policyandsecurity/ > > ------------------------------ > > Date: Mon, 17 Jan 2022 09:56:14 +0100 > From: "CAVARELLI Jean (VIE)" <jean.cavare...@unistra.fr> > Subject: Re: Validation of structure prediction > > > May be useful if not already mentioned > > Ten things I `hate' about refinement > Pietro Roversi and Dale E. Tronrud > https://journals.iucr.org/d/issues/2021/12/00/qt5008/index.html > > ------------------------- > Jean Cavarelli > Professor of Structural Biology > "Structural biology of epigenetic targets" > Department of Integrated structural biology > IGBMC,UMR7104 CNRS-UNISTRA, INSERM U 1258 > phone : +33 (0)3 69 48 52 74 > > > De: "Jan Dohnalek" <dohnalek...@gmail.com> > À: "ccp4bb" <CCP4BB@JISCMAIL.AC.UK> > Envoyé: Lundi 17 Janvier 2022 09:39:33 > Objet: Re: [ccp4bb] Validation of structure prediction > > I think quite a bit of this "inconsistency" with protein structures comes > from the fact that with our larger globules it is much more true that our > model is an approximate time and space average of something that could have > the ideal geometry. > I.e. the way we are trying to represent the density is actually not that > appropriate. The only "improvement" to this I think is the multiple model > approach. > > My 2 c. > > Jan > > > On Sat, Jan 15, 2022 at 9:29 PM James Holton < [ mailto:jmhol...@lbl.gov | > jmhol...@lbl.gov ] > wrote: > > > > > On 1/13/2022 11:14 AM, Tristan Croll wrote: > > BQ_BEGIN > (please don’t actually do this) > > > Too late! I've been doing that for years. What happens, of course, is the > "geometry" improves, but the R factors go through the roof. This I expect > comes as no surprise to anyone who has played with the "weight" parameters in > refinement, but maybe it should? What is it about our knowledge of chemical > bond lengths, angles, and radii that is inconsistent with the electron > density of macromolecules, but not small molecules? Why do macro-models have > a burning desire to leap away from the configuration we know they adopt in > reality? If you zoom in on those "bad clashes" individually, they don't look > like something that is supposed to happen. There is a LOT of energy stored up > in those little springs. I have a hard time thinking that's for real. The > molecule is no doubt doing something else and we're just not capturing it > properly. There is information to be had here, a lot of information. > > This is why I too am looking for an all-encompassing "geometry score". Right > now I'm multiplying other scores together: > > score = (1+Clashscore)*sin(worst_omega)*1./(1+worst_rama)*1/(1+worst_rota) > *Cbetadev*worst_nonbond*worst_bond*worst_angle*worst_dihedral*worst_chir*worst_plane > > > where things like worst_rama is the "%score" given to the worst Ramachandran > angle by phenix.ramalyze, and worst_bond is the largest "residual" reported > among all the bonds in the structure by molprobity or > phenix.geometry_minimization. For "worst_nonbond" I'm plugging the observed > and ideal distances into a Leonard-Jones6-12 potential to convert it into an > "energy" that is always positive. > > With x-ray data in hand, I've been multiplying this whole thing by Rwork and > trying to find clever ways to minimize the product. Rfree is then, as always, > the cross-check. > > Or does someone have a better idea? > > -James Holton > MAD Scientist > > > On 1/13/2022 11:14 AM, Tristan Croll wrote: > > BQ_BEGIN > > Hard but not impossible - even when you *are* fitting to low-res density. See > [ https://twitter.com/crolltristan/status/1381258326223290373?s=21 | > https://twitter.com/crolltristan/status/1381258326223290373?s=21 ] for > example - no Ramachandran outliers, 1.3% sidechain outliers, clashscore of > 2... yet multiple regions out of register by anywhere up to 15 residues! I > never publicly named the structure (although I did share my rebuilt model > with the authors), but the videos and images in that thread should be enough > to illustrate the scale of the problem. > > And that was *with* a map to fit! Take away the map, and run some MD energy > minimisation (perhaps with added Ramachandran and rotamer restraints), and I > think it would be easy to get your model to fool most “simple” validation > metrics (please don’t actually do this). The upshot is that I still think > validation of predicted models in the absence of at least moderate-resolution > experimental data is still a major challenge requiring very careful thought. > > — Tristan > > On 13 Jan 2022, at 18:41, James Holton < [ mailto:jmhol...@lbl.gov | > jmhol...@lbl.gov ] > wrote: > > > BQ_BEGIN > > Agree with Pavel. > > Something I think worth adding is a reminder that the MolProbity score only > looks at bad clashes, ramachandran and rotamer outliers. > > MPscore=0.426∗ln(1+clashscore)+0.33∗ln(1+max(0,rota_out−1))+0.25∗ln(1+max(0,rama_iffy−2))+0.5 > > > It pays no attention whatsoever to twisted peptide bonds, C-beta deviations, > and, for that matter, bond lengths and bond angles. If you tweak your weights > right you can get excellent MP scores, but horrible "geometry" in the > traditional bonds-and-angles sense. The logic behind this kind of validation > is that normally nonbonds and torsions are much softer than bond and angle > restraints and therefore fertile ground for detecting problems. Thus far, I > am not aware of any "Grand Unified Score" that combines all geometric > considerations, but perhaps it is time for one? > > Tristan's trivial solution aside, it is actually very hard to make all the > "geometry" ideal for a real-world fold, and especially difficult to do > without also screwing up the agreement with density (R factor). I would argue > that if you don't have an R factor then you should get one, but I am > interested in opinions about alternatives. > > I.E. What if we could train an AI to predict Rfree by looking at the > coordinates? > > -James Holton > MAD Scientist > > On 12/21/2021 9:25 AM, Pavel Afonine wrote: > > BQ_BEGIN > > Hi Reza, > > > If you think about it this way... Validation is making sure that the model > makes sense, data make sense and model-to-data fit make sense, then the > answer to your question is obvious: in your case you do not have experimental > data (at least in a way we used to think of it) and so then of these three > validation items you only have one, which, for example, means you don’t have > to report things like R-factors or completeness in high-resolution shell. > > Really, the geometry of an alpha helix does not depend on how you determined > it: using X-rays or cryo-EM or something else! So, most (if not all) model > validation tools still apply. > > Pavel > > On Mon, Dec 20, 2021 at 8:10 AM Reza Khayat < [ mailto:rkha...@ccny.cuny.edu > | rkha...@ccny.cuny.edu ] > wrote: > > BQ_BEGIN > > > > Hi, > > > > Can anyone suggest how to validate a predicted structure? Something similar > to wwPDB validation without the need for refinement statistics. I realize > this is a strange question given that the geometry of the model is > anticipated to be fine if the structure was predicted by a server that > minimizes the geometry to improve its statistics. Nonetheless, the journal > has asked me for such a report. Thanks. > > > > Best wishes, > > > Reza > > > > Reza Khayat, PhD > Associate Professor > City College of New York > Department of Chemistry and Biochemistry > New York, NY 10031 > > > > > To unsubscribe from the CCP4BB list, click the following link: > [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] > BQ_END > > > > > > To unsubscribe from the CCP4BB list, click the following link: > [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] > BQ_END > > > > > > To unsubscribe from the CCP4BB list, click the following link: > [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] > > BQ_END > > BQ_END > > > > > > To unsubscribe from the CCP4BB list, click the following link: > [ https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 | > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ] > BQ_END > > > > -- > Jan Dohnalek, Ph.D > Institute of Biotechnology > Academy of Sciences of the Czech Republic > Biocev > Prumyslova 595 > 252 50 Vestec near Prague > Czech Republic > > Tel. +420 325 873 758 > > ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/