And there was Crambin to 0.48Å (I’ll leave it to others to argue whether or not cramin is a protein, since it has “only” 46 amino acids) where (from memory, I haven’t read the paper for at least 20 nyears…) they modelled multiple water networks.
3NIR, for reference. Harry > On 20 Mar 2023, at 10:20, Eleanor Dodson > <0000176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk> wrote: > > Thank you for such a careful analysis of modelling a "true" structure. You > should publish this James. > > It shows amongst other things, how R factors depend on our modelling of > solvent which is not represented as individual atoms (And also I think on how > the scales are derived between observation and calculation.) > Years ago someone refined vitamin B12 against high resolution (0.6A?) data. > There is about 20-25% solvent volume I think..t was clear in the maps that > there were partially occupied networks of water which extended throughout the > lattice. This is probably true for proteins as well, and must affect the > conformations of surface sidechains? > Eleanor > The B12creference ... > > Biophys J. 1986 Nov; 50(5): 967–980. > doi: 10.1016/S0006-3495(86)83537-8 > PMCID: PMC1329821 > PMID: 3790697 > Water structure in vitamin B12 coenzyme crystals. II. Structural > characteristics of the solvent networks. > > H Savage > > On Sun, 19 Mar 2023 at 19:37, James Holton <jmhol...@lbl.gov> wrote: > They say one test is worth a thousand expert opinions, so I tried my hand at > the former. > > The question is: what is the right way to treat disordered side chains?: > a) omit atoms you cannot see > b) build them, and set occupancy to zero > c) build them, and "let the B factors take care of it" > d) none of the above > > The answer, of course, is d). > > Oh, c'mon. Yes, I know one of a,b, or c is what you've been doing your whole > life. I do it too. But, let's face it: none of these solutions are perfect. > So, the real question is not which one is "right", but which is the least > wrong? > > We all know what is really going on: the side chain is flapping around. No > doubt it spends most of its time in energetically reasonable but nevertheless > numerous conformations. There are 41 "Favorable" rotamers for Lys alone, and > it doesn't take that many to spread the density thin enough to fall below the > classical 1-sigma contour level. The atoms are still there, they are still > contributing to the data, and they haven't gone far. So why don't we "just" > model that? Already, I can hear the cries of "over-fitting!" and > "observations/parameters!", "model bias!", and "think of the children!" > Believe it or not, none of these are the major issue here. Allow me to > demonstrate: > > Consider a simple case where we have a Lys side chain in ten conformers. I > chose from popular rotamers, but evenly spread. That is, all 10 conformers > have an occupancy of 0.10, and there is a 3-3-4 split of chi1 values between > minus, plus and trans. This will give the maximum contrast of density > between CB and CG. Let us further require that there is no strain in this > ground-truth. No stretched bonds, no tortured angles, no clashes, etc. Real > molecules don't occupy such high-energy states unless they absolutely have > to. Let us further assume that the bulk solvent works the way phenix models > it, which is a probe radius of 1.1 A for both ions and aliphatics and a > shrink radius of 0.9. But, instead of running one phenix.fmodel job, I ran > ten: one for each conformer (A thru J). To add some excitement, I moved the > main chain ~0.2 A in a random direction for each conformer. I then took these > ten calculated electron density maps (bulk solvent and all) and added them > together to form the ground truth for the following trials. Before > refinement, I added noise consistent with an I/sigma of 50 and cut the > resolution at 2.0 A. Wilson B is 50: > > CCtrue Rwork% Rfree% fo-fc(sigma) description > 0.8943 9.05 10.60 5.9 stump at CB > 0.9540 9.29 11.73 6.0 single conformer, zero occupancy > 0.9471 10.35 15.04 5.1 single conformer, full occupancy, > refmac5 > 0.9523 9.78 15.61 4.9 single conformer, full occupancy, > phenix.refine > > So, it would appear that the zero-occupancy choice "wins", but by the > narrowest of margins. Here CCtrue is the Pearson correlation coefficient > between the ground-truth right-answer electron density and the 2fofc map > resulting from the refinement. Rwork and Rfree are the usual suspects, and > fo-fc indicates the tallest peak in the difference map. Refinement was with > refmac unless otherwise indicated. I think we often forget that both phenix > and refmac restrain B factor values, not just through bonds but through > space, and they use rather different algorithms. Refmac tries to make the > histogram of B factors "look right", whereas phenix allows steeper gradients. > I also ran all 10 correct rotamers separately and picked the one with the > best CCtrue to show above. If you instead sort on Rfree (which you really > shouldn't do), you get different bests, but they are not much better (as low > as 10.5%). So, the winner here depends on how you score. CCtrue is the best > score, but also unfortunately unavailable for real data. > > It is perhaps interesting here that better CCtrue goes along with worse > Rfree. This is not what I usually see in experiments like this. Rather, what > I think is going on here is the system is frustrated. We are trying to fit > various square pegs into a round hole, and none of them fit all that well. > > In all cases here the largest difference peak was indicating another place to > put the Lys, so why not build into that screaming, 6-sigma difference peak? > Here is what happens when you do that: > > CCtrue Rwork% Rfree% fo-fc(sigma) description > 0.8943 9.05 10.60 5.9 stump at CB > 0.9580 9.95 11.60 6.4 stump at CG > 0.9585 10.20 12.29 6.2 stump at CG, all 10 confs > 0.9543 10.61 12.24 5.3 stump at CD, all 10 confs > 0.9383 10.69 14.64 4.1 stump at CE, all 10 confs > 0.9476 9.66 13.48 4.6 all atoms, all 10 confs > 0.9214 7.09 11.8 5.6 three conformers (worst of 120 > combos) > 0.9718 6.53 8.55 4.3 three conformers (best of 120 combos) > 0.9710 7.17 9.44 6.1 two conformers (best of 45 combos) > 0.9471 10.35 15.04 5.1 single conformer (best of 10 choices) > > If I add one CG, the other two chi1 positions light up. So, I tried building > in all 10 true CG positions, and let the refinement decide what to do with > them. The clear indication was that a CD should be added. After adding all > the CDs, the difference peaks were weaker, but still indicating more atoms > were needed. Rwork and Rfree, however, tell the opposite story. They get > worse the more atoms you add. CCtrue, on the other hand, was best when > cutting everything after CG. Why is that? Well, every time you add another > atom you fill in the difference density, but then that atom pushes back the > bulk solvent model that was filling in the density for the next atom. The > atom-to-solvent distance is roughly twice that of a covalent bond. So again, > square pegs and round holes. > > Three conformers coming out as the winner may be because it is a selective > process with a noisy score. In the ground truth there are 10 conformers at > equal occupancy, so no one triplet is really any better than any other. > However, one has a density shape that fits better than other combos. My > search over all possible quartets is still running. > > But what if we got the solvent "right"? Well, here is what that looks like: > > CCtrue Rwork% Rfree% fo-fc(sigma) description > 0.9476 9.66 13.48 4.6 all atoms, all confs, refmac defaults > 0.9696 6.15 8.88 3.7 all atoms, all confs, phenix.refine > 0.9825 0.80 0.89 3.9 all atoms, all confs, true solvent > 0.9824 0.92 1.26 7.3 true model, minus one H atom from > ordered HIS side chain > > You can see that the default solvent of phenix.refine fares better than > refmac here, but since I generated the solvent with phenix refine it may have > an unfair advantage. Nevertheless, providing the "true solvent" here is quite > a striking drop in R factors. This is not surprising since this was the last > systematic error in this ground truth. In all cases, I provided the true > atomic positions at the start of refinement, so there was no confusion about > strain-inducing local minima, such as which rotamer goes with which main > chain shift. And yes, you can provide arbitrary bulk solvent maps to refmac5 > using the "Fpart" feature. I've had good luck with real data using bulk > density derived form MD simulations. > > What is more, once the R factors are this low I can remove just one hydrogen > atom and it comes back as a 7.3-sigma difference peak. This corresponds to > the protonation state of that His. This kind of sensitivity is really > attractive if you are looking for low-lying features, such as > partially-occupied ligands. Some may pooh-pooh R factors as "cosmetic" > features of structures, but they are, in fact, nothing more or less than the > % error between your model and your data. This % error translates directly > into the noise level of your map. At 20% error there is no hope whatsoever > of seeing 1-electron changes. This is because hydrogen is only 17% of a > carbon. But 3-5% error, which is a typical experimental error in > crystallographic data, anything bigger than one electron is clear. > > -James Holton > MAD Scientist > > > > On 3/18/2023 2:10 PM, Nicholas Pearce wrote: >> Not stupid, but essentially the same as modelling alt confs, though would >> probably give more overfitting. Alt confs can easily be converted to an >> ensemble (if done properly…). >> >> Thanks, >> Nick >> >> ——— >> >> Nicholas Pearce >> Assistant Professor in Bioinformatics & DDLS Fellow >> Linköping University >> Sweden >> >> From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of benjamin bax >> <ben.d.v....@gmail.com> >> Sent: Saturday, March 18, 2023 10:07:26 PM >> To: CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK> >> Subject: Re: [ccp4bb] To Trim or Not to To Trim >> >> Hi, >> Probably a stupid question. >> Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or 27 >> structures) and restrain well defined parts of structure to be ‘identical’ ? >> To give you a more NMR like chemically sensible ensemble of structures? >> Ben >> >> >> > On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote: >> > >> > Models for crystallography have two purposes: refinement and >> > interpretation. Here these two purposes are in conflict. Neither case is >> > handled well by either trim or not trim scenario, but trimming results in >> > a deficit for refinement and not-trimming results in a deficit for >> > interpretation. >> > >> > Our computational tools are not “fixed” in the same way that the standard >> > amino acids are “fixed” or your government’s bureaucracy pathways are >> > “fixed”. They are open for debate and for adjustments. This is a fine >> > example where it may be more productive to discuss the options for making >> > changes to the model itself or its representation, to better account for >> > awkward situations such as these. Otherwise we are left figuring out the >> > best imperfect way to use an imperfect tool (as all tools are, to varying >> > degrees!), which isn’t satisfying for enough people, enough of the time. >> > >> > I now appreciate the hypocrisy in the argument “do not trim, but also >> > don’t model disordered regions”, even though I’d be keen to avoid >> > trimming. This discussion has therefore softened my own viewpoint. >> > >> > My refinement models (as implemented in Vagabond) do away with the concept >> > of B factors precisely for the anguish it causes here, and refines a >> > distribution of protein conformations which is sampled to generate an >> > ensemble. By describing the conformations through the torsion angles that >> > comprise the protein, modelling flexibility of a disordered lysine is >> > comparatively trivial, and indeed modelling all possible conformations of >> > a disordered loop becomes feasible. Lysines end up looking like a frayed >> > end of a rope. Each conformation can produce its own solvent mask, which >> > can be summed together to produce a blurring of density that matches what >> > you would expect to see in the crystal. >> > >> > In my experience this doesn’t drop the R factors as much as you’d assume, >> > because blurred out protein density does look very much like solvent, but >> > it vastly improves the interpretability of the model. This also better >> > models the boundary between the atoms you would trim and those you’d leave >> > untrimmed, by avoiding such a binary distinction. No fear of trimming and >> > pushing those errors unseen into the rest of the structure. No fear of >> > leaving atoms in with an inadequate B factor model that cannot capture the >> > nature of the disorder. >> > >> > Vagabond is undergoing a heavy rewrite though, and is not yet ready for >> > human consumption. Its first iteration worked on >> > single-dataset-single-model refinement, which handled disordered side >> > chains well enough, with no need to decide to exclude atoms. The heart of >> > the issue lies in main chain flexibility, and this must be handled >> > correctly, for reasons of interpretability and elucidating the biological >> > impact. This model isn’t perfect either, and necessitates its own >> > compromises - but will provide another tool in the structural biology >> > arsenal. >> > >> > —- >> > >> > Dr Helen Ginn >> > Group leader, DESY >> > Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR) >> > Luruper Chaussee 149 >> > 22607 Hamburg >> > >> > ######################################################################## >> > >> > To unsubscribe from the CCP4BB list, click the following link: >> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0 >> > >> > This message was issued to members of >> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0, >> > a mailing list hosted by >> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0, >> > terms & conditions are available at >> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0 >> >> ######################################################################## >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0 >> >> This message was issued to members of >> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0, >> a mailing list hosted by >> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0, >> terms & conditions are available at >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0 >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/