Whoops! A quick glance at the PDB entry indicates I must have been clairvoyant to have read it 20 years ago.
Harry > On 20 Mar 2023, at 10:35, Harry Powell <hrp-ccp...@virginmedia.com> wrote: > > And there was Crambin to 0.48Å (I’ll leave it to others to argue whether or > not cramin is a protein, since it has “only” 46 amino acids) where (from > memory, I haven’t read the paper for at least 20 nyears…) they modelled > multiple water networks. > > 3NIR, for reference. > > Harry > > >> On 20 Mar 2023, at 10:20, Eleanor Dodson >> <0000176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk> wrote: >> >> Thank you for such a careful analysis of modelling a "true" structure. You >> should publish this James. >> >> It shows amongst other things, how R factors depend on our modelling of >> solvent which is not represented as individual atoms (And also I think on >> how the scales are derived between observation and calculation.) >> Years ago someone refined vitamin B12 against high resolution (0.6A?) data. >> There is about 20-25% solvent volume I think..t was clear in the maps that >> there were partially occupied networks of water which extended throughout >> the lattice. This is probably true for proteins as well, and must affect the >> conformations of surface sidechains? >> Eleanor >> The B12creference ... >> >> Biophys J. 1986 Nov; 50(5): 967–980. >> doi: 10.1016/S0006-3495(86)83537-8 >> PMCID: PMC1329821 >> PMID: 3790697 >> Water structure in vitamin B12 coenzyme crystals. II. Structural >> characteristics of the solvent networks. >> >> H Savage >> >> On Sun, 19 Mar 2023 at 19:37, James Holton <jmhol...@lbl.gov> wrote: >> They say one test is worth a thousand expert opinions, so I tried my hand at >> the former. >> >> The question is: what is the right way to treat disordered side chains?: >> a) omit atoms you cannot see >> b) build them, and set occupancy to zero >> c) build them, and "let the B factors take care of it" >> d) none of the above >> >> The answer, of course, is d). >> >> Oh, c'mon. Yes, I know one of a,b, or c is what you've been doing your >> whole life. I do it too. But, let's face it: none of these solutions are >> perfect. So, the real question is not which one is "right", but which is >> the least wrong? >> >> We all know what is really going on: the side chain is flapping around. No >> doubt it spends most of its time in energetically reasonable but >> nevertheless numerous conformations. There are 41 "Favorable" rotamers for >> Lys alone, and it doesn't take that many to spread the density thin enough >> to fall below the classical 1-sigma contour level. The atoms are still >> there, they are still contributing to the data, and they haven't gone far. >> So why don't we "just" model that? Already, I can hear the cries of >> "over-fitting!" and "observations/parameters!", "model bias!", and "think of >> the children!" Believe it or not, none of these are the major issue here. >> Allow me to demonstrate: >> >> Consider a simple case where we have a Lys side chain in ten conformers. I >> chose from popular rotamers, but evenly spread. That is, all 10 conformers >> have an occupancy of 0.10, and there is a 3-3-4 split of chi1 values between >> minus, plus and trans. This will give the maximum contrast of density >> between CB and CG. Let us further require that there is no strain in this >> ground-truth. No stretched bonds, no tortured angles, no clashes, etc. Real >> molecules don't occupy such high-energy states unless they absolutely have >> to. Let us further assume that the bulk solvent works the way phenix models >> it, which is a probe radius of 1.1 A for both ions and aliphatics and a >> shrink radius of 0.9. But, instead of running one phenix.fmodel job, I ran >> ten: one for each conformer (A thru J). To add some excitement, I moved the >> main chain ~0.2 A in a random direction for each conformer. I then took >> these ten calculated electron density maps (bulk solvent and all) and added >> them together to form the ground truth for the following trials. Before >> refinement, I added noise consistent with an I/sigma of 50 and cut the >> resolution at 2.0 A. Wilson B is 50: >> >> CCtrue Rwork% Rfree% fo-fc(sigma) description >> 0.8943 9.05 10.60 5.9 stump at CB >> 0.9540 9.29 11.73 6.0 single conformer, zero occupancy >> 0.9471 10.35 15.04 5.1 single conformer, full occupancy, >> refmac5 >> 0.9523 9.78 15.61 4.9 single conformer, full occupancy, >> phenix.refine >> >> So, it would appear that the zero-occupancy choice "wins", but by the >> narrowest of margins. Here CCtrue is the Pearson correlation coefficient >> between the ground-truth right-answer electron density and the 2fofc map >> resulting from the refinement. Rwork and Rfree are the usual suspects, and >> fo-fc indicates the tallest peak in the difference map. Refinement was with >> refmac unless otherwise indicated. I think we often forget that both phenix >> and refmac restrain B factor values, not just through bonds but through >> space, and they use rather different algorithms. Refmac tries to make the >> histogram of B factors "look right", whereas phenix allows steeper >> gradients. I also ran all 10 correct rotamers separately and picked the one >> with the best CCtrue to show above. If you instead sort on Rfree (which you >> really shouldn't do), you get different bests, but they are not much better >> (as low as 10.5%). So, the winner here depends on how you score. CCtrue is >> the best score, but also unfortunately unavailable for real data. >> >> It is perhaps interesting here that better CCtrue goes along with worse >> Rfree. This is not what I usually see in experiments like this. Rather, what >> I think is going on here is the system is frustrated. We are trying to fit >> various square pegs into a round hole, and none of them fit all that well. >> >> In all cases here the largest difference peak was indicating another place >> to put the Lys, so why not build into that screaming, 6-sigma difference >> peak? Here is what happens when you do that: >> >> CCtrue Rwork% Rfree% fo-fc(sigma) description >> 0.8943 9.05 10.60 5.9 stump at CB >> 0.9580 9.95 11.60 6.4 stump at CG >> 0.9585 10.20 12.29 6.2 stump at CG, all 10 confs >> 0.9543 10.61 12.24 5.3 stump at CD, all 10 confs >> 0.9383 10.69 14.64 4.1 stump at CE, all 10 confs >> 0.9476 9.66 13.48 4.6 all atoms, all 10 confs >> 0.9214 7.09 11.8 5.6 three conformers (worst of 120 >> combos) >> 0.9718 6.53 8.55 4.3 three conformers (best of 120 >> combos) >> 0.9710 7.17 9.44 6.1 two conformers (best of 45 combos) >> 0.9471 10.35 15.04 5.1 single conformer (best of 10 >> choices) >> >> If I add one CG, the other two chi1 positions light up. So, I tried >> building in all 10 true CG positions, and let the refinement decide what to >> do with them. The clear indication was that a CD should be added. After >> adding all the CDs, the difference peaks were weaker, but still indicating >> more atoms were needed. Rwork and Rfree, however, tell the opposite story. >> They get worse the more atoms you add. CCtrue, on the other hand, was best >> when cutting everything after CG. Why is that? Well, every time you add >> another atom you fill in the difference density, but then that atom pushes >> back the bulk solvent model that was filling in the density for the next >> atom. The atom-to-solvent distance is roughly twice that of a covalent >> bond. So again, square pegs and round holes. >> >> Three conformers coming out as the winner may be because it is a selective >> process with a noisy score. In the ground truth there are 10 conformers at >> equal occupancy, so no one triplet is really any better than any other. >> However, one has a density shape that fits better than other combos. My >> search over all possible quartets is still running. >> >> But what if we got the solvent "right"? Well, here is what that looks like: >> >> CCtrue Rwork% Rfree% fo-fc(sigma) description >> 0.9476 9.66 13.48 4.6 all atoms, all confs, refmac >> defaults >> 0.9696 6.15 8.88 3.7 all atoms, all confs, phenix.refine >> 0.9825 0.80 0.89 3.9 all atoms, all confs, true solvent >> >> 0.9824 0.92 1.26 7.3 true model, minus one H atom from >> ordered HIS side chain >> >> You can see that the default solvent of phenix.refine fares better than >> refmac here, but since I generated the solvent with phenix refine it may >> have an unfair advantage. Nevertheless, providing the "true solvent" here is >> quite a striking drop in R factors. This is not surprising since this was >> the last systematic error in this ground truth. In all cases, I provided >> the true atomic positions at the start of refinement, so there was no >> confusion about strain-inducing local minima, such as which rotamer goes >> with which main chain shift. And yes, you can provide arbitrary bulk >> solvent maps to refmac5 using the "Fpart" feature. I've had good luck with >> real data using bulk density derived form MD simulations. >> >> What is more, once the R factors are this low I can remove just one hydrogen >> atom and it comes back as a 7.3-sigma difference peak. This corresponds to >> the protonation state of that His. This kind of sensitivity is really >> attractive if you are looking for low-lying features, such as >> partially-occupied ligands. Some may pooh-pooh R factors as "cosmetic" >> features of structures, but they are, in fact, nothing more or less than the >> % error between your model and your data. This % error translates directly >> into the noise level of your map. At 20% error there is no hope whatsoever >> of seeing 1-electron changes. This is because hydrogen is only 17% of a >> carbon. But 3-5% error, which is a typical experimental error in >> crystallographic data, anything bigger than one electron is clear. >> >> -James Holton >> MAD Scientist >> >> >> >> On 3/18/2023 2:10 PM, Nicholas Pearce wrote: >>> Not stupid, but essentially the same as modelling alt confs, though would >>> probably give more overfitting. Alt confs can easily be converted to an >>> ensemble (if done properly…). >>> >>> Thanks, >>> Nick >>> >>> ——— >>> >>> Nicholas Pearce >>> Assistant Professor in Bioinformatics & DDLS Fellow >>> Linköping University >>> Sweden >>> >>> From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of benjamin bax >>> <ben.d.v....@gmail.com> >>> Sent: Saturday, March 18, 2023 10:07:26 PM >>> To: CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK> >>> Subject: Re: [ccp4bb] To Trim or Not to To Trim >>> >>> Hi, >>> Probably a stupid question. >>> Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or 27 >>> structures) and restrain well defined parts of structure to be ‘identical’ >>> ? To give you a more NMR like chemically sensible ensemble of structures? >>> Ben >>> >>> >>>> On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote: >>>> >>>> Models for crystallography have two purposes: refinement and >>>> interpretation. Here these two purposes are in conflict. Neither case is >>>> handled well by either trim or not trim scenario, but trimming results in >>>> a deficit for refinement and not-trimming results in a deficit for >>>> interpretation. >>>> >>>> Our computational tools are not “fixed” in the same way that the standard >>>> amino acids are “fixed” or your government’s bureaucracy pathways are >>>> “fixed”. They are open for debate and for adjustments. This is a fine >>>> example where it may be more productive to discuss the options for making >>>> changes to the model itself or its representation, to better account for >>>> awkward situations such as these. Otherwise we are left figuring out the >>>> best imperfect way to use an imperfect tool (as all tools are, to varying >>>> degrees!), which isn’t satisfying for enough people, enough of the time. >>>> >>>> I now appreciate the hypocrisy in the argument “do not trim, but also >>>> don’t model disordered regions”, even though I’d be keen to avoid >>>> trimming. This discussion has therefore softened my own viewpoint. >>>> >>>> My refinement models (as implemented in Vagabond) do away with the concept >>>> of B factors precisely for the anguish it causes here, and refines a >>>> distribution of protein conformations which is sampled to generate an >>>> ensemble. By describing the conformations through the torsion angles that >>>> comprise the protein, modelling flexibility of a disordered lysine is >>>> comparatively trivial, and indeed modelling all possible conformations of >>>> a disordered loop becomes feasible. Lysines end up looking like a frayed >>>> end of a rope. Each conformation can produce its own solvent mask, which >>>> can be summed together to produce a blurring of density that matches what >>>> you would expect to see in the crystal. >>>> >>>> In my experience this doesn’t drop the R factors as much as you’d assume, >>>> because blurred out protein density does look very much like solvent, but >>>> it vastly improves the interpretability of the model. This also better >>>> models the boundary between the atoms you would trim and those you’d leave >>>> untrimmed, by avoiding such a binary distinction. No fear of trimming and >>>> pushing those errors unseen into the rest of the structure. No fear of >>>> leaving atoms in with an inadequate B factor model that cannot capture the >>>> nature of the disorder. >>>> >>>> Vagabond is undergoing a heavy rewrite though, and is not yet ready for >>>> human consumption. Its first iteration worked on >>>> single-dataset-single-model refinement, which handled disordered side >>>> chains well enough, with no need to decide to exclude atoms. The heart of >>>> the issue lies in main chain flexibility, and this must be handled >>>> correctly, for reasons of interpretability and elucidating the biological >>>> impact. This model isn’t perfect either, and necessitates its own >>>> compromises - but will provide another tool in the structural biology >>>> arsenal. >>>> >>>> —- >>>> >>>> Dr Helen Ginn >>>> Group leader, DESY >>>> Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR) >>>> Luruper Chaussee 149 >>>> 22607 Hamburg >>>> >>>> ######################################################################## >>>> >>>> To unsubscribe from the CCP4BB list, click the following link: >>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0 >>>> >>>> This message was issued to members of >>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0, >>>> a mailing list hosted by >>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0, >>>> terms & conditions are available at >>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0 >>> >>> ######################################################################## >>> >>> To unsubscribe from the CCP4BB list, click the following link: >>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0 >>> >>> This message was issued to members of >>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0, >>> a mailing list hosted by >>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0, >>> terms & conditions are available at >>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0 >>> >>> To unsubscribe from the CCP4BB list, click the following link: >>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >>> >> >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> > ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/