Re: [ccp4bb] To Trim or Not to To Trim

Harry Powell Mon, 20 Mar 2023 03:36:32 -0700

Whoops!

A quick glance at the PDB entry indicates I must have been clairvoyant to have 
read it 20 years ago.


Harry

> On 20 Mar 2023, at 10:35, Harry Powell <hrp-ccp...@virginmedia.com> wrote:
> 
> And there was Crambin to 0.48Å (I’ll leave it to others to argue whether or 
> not cramin is a protein, since it has “only” 46 amino acids) where (from 
> memory, I haven’t read the paper for at least 20 nyears…) they modelled 
> multiple water networks.
> 
> 3NIR, for reference.
> 
> Harry
> 
> 
>> On 20 Mar 2023, at 10:20, Eleanor Dodson 
>> <0000176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk> wrote:
>> 
>> Thank you for such a careful analysis of modelling a "true" structure. You 
>> should publish this James.
>> 
>> It shows amongst other things, how R factors depend on our modelling of 
>> solvent which is not represented as individual atoms (And also I think on 
>> how the scales are derived between observation and calculation.)
>> Years ago someone refined vitamin B12 against high resolution (0.6A?) data. 
>> There is about 20-25% solvent volume I think..t was clear in the maps that 
>> there were partially occupied networks of water which extended throughout 
>> the lattice. This is probably true for proteins as well, and must affect the 
>> conformations of surface sidechains? 
>> Eleanor
>> The B12creference ...
>> 
>> Biophys J. 1986 Nov; 50(5): 967–980.
>> doi: 10.1016/S0006-3495(86)83537-8
>> PMCID: PMC1329821
>> PMID: 3790697
>> Water structure in vitamin B12 coenzyme crystals. II. Structural 
>> characteristics of the solvent networks.
>> 
>> H Savage
>> 
>> On Sun, 19 Mar 2023 at 19:37, James Holton <jmhol...@lbl.gov> wrote:
>> They say one test is worth a thousand expert opinions, so I tried my hand at 
>> the former.
>> 
>> The question is: what is the right way to treat disordered side chains?:
>> a) omit atoms you cannot see
>> b) build them, and set occupancy to zero
>> c) build them, and "let the B factors take care of it"
>> d) none of the above
>> 
>> The answer, of course, is d).
>> 
>> Oh, c'mon.  Yes, I know one of a,b, or c is what you've been doing your 
>> whole life. I do it too.  But, let's face it: none of these solutions are 
>> perfect.  So, the real question is not which one is "right", but which is 
>> the least wrong?  
>> 
>> We all know what is really going on: the side chain is flapping around. No 
>> doubt it spends most of its time in energetically reasonable but 
>> nevertheless numerous conformations.  There are 41 "Favorable" rotamers for 
>> Lys alone, and it doesn't take that many to spread the density thin enough 
>> to fall below the classical 1-sigma contour level. The atoms are still 
>> there, they are still contributing to the data, and they haven't gone far. 
>> So why don't we "just" model that?  Already, I can hear the cries of 
>> "over-fitting!" and "observations/parameters!", "model bias!", and "think of 
>> the children!"  Believe it or not, none of these are the major issue here. 
>> Allow me to demonstrate:
>> 
>> Consider a simple case where we have a Lys side chain in ten conformers. I 
>> chose from popular rotamers, but evenly spread. That is, all 10 conformers 
>> have an occupancy of 0.10, and there is a 3-3-4 split of chi1 values between 
>> minus, plus and trans.  This will give the maximum contrast of density 
>> between CB and CG.  Let us further require that there is no strain in this 
>> ground-truth. No stretched bonds, no tortured angles, no clashes, etc.  Real 
>> molecules don't occupy such high-energy states unless they absolutely have 
>> to.  Let us further assume that the bulk solvent works the way phenix models 
>> it, which is a probe radius of 1.1 A for both ions and aliphatics and a 
>> shrink radius of 0.9.  But, instead of running one phenix.fmodel job, I ran 
>> ten: one for each conformer (A thru J).  To add some excitement, I moved the 
>> main chain ~0.2 A in a random direction for each conformer. I then took 
>> these ten calculated electron density maps (bulk solvent and all) and added 
>> them together to form the ground truth for the following trials. Before 
>> refinement, I added noise consistent with an I/sigma of 50 and cut the 
>> resolution at 2.0 A. Wilson B is 50:
>> 
>> CCtrue   Rwork%  Rfree%   fo-fc(sigma)   description
>> 0.8943     9.05   10.60      5.9         stump at CB
>> 0.9540     9.29   11.73      6.0         single conformer, zero occupancy
>> 0.9471    10.35   15.04      5.1         single conformer, full occupancy, 
>> refmac5
>> 0.9523     9.78   15.61      4.9         single conformer, full occupancy, 
>> phenix.refine
>> 
>> So, it would appear that the zero-occupancy choice "wins", but by the 
>> narrowest of margins.  Here CCtrue is the Pearson correlation coefficient 
>> between the ground-truth right-answer electron density and the 2fofc map 
>> resulting from the refinement.  Rwork and Rfree are the usual suspects, and 
>> fo-fc indicates the tallest peak in the difference map. Refinement was with 
>> refmac unless otherwise indicated. I think we often forget that both phenix 
>> and refmac restrain B factor values, not just through bonds but through 
>> space, and they use rather different algorithms. Refmac tries to make the 
>> histogram of B factors "look right", whereas phenix allows steeper 
>> gradients. I also ran all 10 correct rotamers separately and picked the one 
>> with the best CCtrue to show above. If you instead sort on Rfree (which you 
>> really shouldn't do), you get different bests, but they are not much better 
>> (as low as 10.5%).  So, the winner here depends on how you score.  CCtrue is 
>> the best score, but also unfortunately unavailable for real data.
>> 
>>  It is perhaps interesting here that better CCtrue goes along with worse 
>> Rfree. This is not what I usually see in experiments like this. Rather, what 
>> I think is going on here is the system is frustrated. We are trying to fit 
>> various square pegs into a round hole, and none of them fit all that well. 
>> 
>> In all cases here the largest difference peak was indicating another place 
>> to put the Lys, so why not build into that screaming, 6-sigma difference 
>> peak?  Here is what happens when you do that:
>> 
>> CCtrue   Rwork%  Rfree%   fo-fc(sigma)   description
>> 0.8943     9.05   10.60      5.9         stump at CB
>> 0.9580     9.95   11.60      6.4         stump at CG
>> 0.9585    10.20   12.29      6.2         stump at CG, all 10 confs
>> 0.9543    10.61   12.24      5.3         stump at CD, all 10 confs
>> 0.9383    10.69   14.64      4.1         stump at CE, all 10 confs
>> 0.9476     9.66   13.48      4.6         all atoms, all 10 confs
>> 0.9214     7.09    11.8      5.6         three conformers (worst of 120 
>> combos)
>> 0.9718     6.53    8.55      4.3         three conformers (best of 120 
>> combos)
>> 0.9710     7.17    9.44      6.1         two conformers (best of 45 combos)
>> 0.9471    10.35   15.04      5.1         single conformer (best of 10 
>> choices)
>> 
>> If I add one CG, the other two chi1 positions light up.  So, I tried 
>> building in all 10 true CG positions, and let the refinement decide what to 
>> do with them. The clear indication was that a CD should be added. After 
>> adding all the CDs, the difference peaks were weaker, but still indicating 
>> more atoms were needed.  Rwork and Rfree, however, tell the opposite story.  
>> They get worse the more atoms you add.  CCtrue, on the other hand, was best 
>> when cutting everything after CG.  Why is that?  Well, every time you add 
>> another atom you fill in the difference density, but then that atom pushes 
>> back the bulk solvent model that was filling in the density for the next 
>> atom.  The atom-to-solvent distance is roughly twice that of a covalent 
>> bond.  So again, square pegs and round holes.  
>> 
>> Three conformers coming out as the winner may be because it is a selective 
>> process with a noisy score. In the ground truth there are 10 conformers at 
>> equal occupancy, so no one triplet is really any better than any other. 
>> However, one has a density shape that fits better than other combos. My 
>> search over all possible quartets is still running.
>> 
>> But what if we got the solvent "right"?  Well, here is what that looks like:
>> 
>> CCtrue   Rwork%  Rfree%   fo-fc(sigma)   description
>> 0.9476     9.66   13.48      4.6         all atoms, all confs, refmac 
>> defaults
>> 0.9696     6.15    8.88      3.7         all atoms, all confs, phenix.refine
>> 0.9825     0.80    0.89      3.9         all atoms, all confs, true solvent  
>>  
>> 0.9824     0.92    1.26      7.3         true model, minus one H atom from 
>> ordered HIS side chain
>> 
>> You can see that the default solvent of phenix.refine fares better than 
>> refmac here, but since I generated the solvent with phenix refine it may 
>> have an unfair advantage. Nevertheless, providing the "true solvent" here is 
>> quite a striking drop in R factors.  This is not surprising since this was 
>> the last systematic error in this ground truth.  In all cases, I provided 
>> the true atomic positions at the start of refinement, so there was no 
>> confusion about strain-inducing local minima, such as which rotamer goes 
>> with which main chain shift.  And yes, you can provide arbitrary bulk 
>> solvent maps to refmac5 using the "Fpart" feature.  I've had good luck with 
>> real data using bulk density derived form MD simulations.
>> 
>> What is more, once the R factors are this low I can remove just one hydrogen 
>> atom and it comes back as a 7.3-sigma difference peak. This corresponds to 
>> the protonation state of that His.  This kind of sensitivity is really 
>> attractive if you are looking for low-lying features, such as 
>> partially-occupied ligands.  Some may pooh-pooh R factors as "cosmetic" 
>> features of structures, but they are, in fact, nothing more or less than the 
>> % error between your model and your data.  This % error translates directly 
>> into the noise level of your map.  At 20% error there is no hope whatsoever 
>> of seeing 1-electron changes. This is because hydrogen is only 17% of a 
>> carbon.  But 3-5% error, which is a typical experimental error in 
>> crystallographic data, anything bigger than one electron is clear.
>> 
>> -James Holton
>> MAD Scientist
>> 
>> 
>> 
>> On 3/18/2023 2:10 PM, Nicholas Pearce wrote:
>>> Not stupid, but essentially the same as modelling alt confs, though would 
>>> probably give more overfitting. Alt confs can easily be converted to an 
>>> ensemble (if done properly…). 
>>> 
>>> Thanks,
>>> Nick
>>> 
>>> ———
>>> 
>>> Nicholas Pearce
>>> Assistant Professor in Bioinformatics & DDLS Fellow
>>> Linköping University
>>> Sweden
>>> 
>>> From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of benjamin bax 
>>> <ben.d.v....@gmail.com>
>>> Sent: Saturday, March 18, 2023 10:07:26 PM
>>> To: CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
>>> Subject: Re: [ccp4bb] To Trim or Not to To Trim
>>> 
>>> Hi,
>>> Probably a stupid question. 
>>> Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or 27 
>>> structures) and restrain well defined parts of structure to be ‘identical’ 
>>> ? To give you a more NMR like chemically sensible ensemble of structures?
>>> Ben
>>> 
>>> 
>>>> On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote:
>>>> 
>>>> Models for crystallography have two purposes: refinement and 
>>>> interpretation. Here these two purposes are in conflict. Neither case is 
>>>> handled well by either trim or not trim scenario, but trimming results in 
>>>> a deficit for refinement and not-trimming results in a deficit for 
>>>> interpretation.
>>>> 
>>>> Our computational tools are not “fixed” in the same way that the standard 
>>>> amino acids are “fixed” or your government’s bureaucracy pathways are 
>>>> “fixed”. They are open for debate and for adjustments. This is a fine 
>>>> example where it may be more productive to discuss the options for making 
>>>> changes to the model itself or its representation, to better account for 
>>>> awkward situations such as these. Otherwise we are left figuring out the 
>>>> best imperfect way to use an imperfect tool (as all tools are, to varying 
>>>> degrees!), which isn’t satisfying for enough people, enough of the time.
>>>> 
>>>> I now appreciate the hypocrisy in the argument “do not trim, but also 
>>>> don’t model disordered regions”, even though I’d be keen to avoid 
>>>> trimming. This discussion has therefore softened my own viewpoint.
>>>> 
>>>> My refinement models (as implemented in Vagabond) do away with the concept 
>>>> of B factors precisely for the anguish it causes here, and refines a 
>>>> distribution of protein conformations which is sampled to generate an 
>>>> ensemble. By describing the conformations through the torsion angles that 
>>>> comprise the protein, modelling flexibility of a disordered lysine is 
>>>> comparatively trivial, and indeed modelling all possible conformations of 
>>>> a disordered loop becomes feasible. Lysines end up looking like a frayed 
>>>> end of a rope. Each conformation can produce its own solvent mask, which 
>>>> can be summed together to produce a blurring of density that matches what 
>>>> you would expect to see in the crystal.
>>>> 
>>>> In my experience this doesn’t drop the R factors as much as you’d assume, 
>>>> because blurred out protein density does look very much like solvent, but 
>>>> it vastly improves the interpretability of the model. This also better 
>>>> models the boundary between the atoms you would trim and those you’d leave 
>>>> untrimmed, by avoiding such a binary distinction. No fear of trimming and 
>>>> pushing those errors unseen into the rest of the structure. No fear of 
>>>> leaving atoms in with an inadequate B factor model that cannot capture the 
>>>> nature of the disorder.
>>>> 
>>>> Vagabond is undergoing a heavy rewrite though, and is not yet ready for 
>>>> human consumption. Its first iteration worked on 
>>>> single-dataset-single-model refinement, which handled disordered side 
>>>> chains well enough, with no need to decide to exclude atoms. The heart of 
>>>> the issue lies in main chain flexibility, and this must be handled 
>>>> correctly, for reasons of interpretability and elucidating the biological 
>>>> impact. This model isn’t perfect either, and necessitates its own 
>>>> compromises - but will provide another tool in the structural biology 
>>>> arsenal.
>>>> 
>>>> —-
>>>> 
>>>> Dr Helen Ginn
>>>> Group leader, DESY
>>>> Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR)
>>>> Luruper Chaussee 149
>>>> 22607 Hamburg
>>>> 
>>>> ########################################################################
>>>> 
>>>> To unsubscribe from the CCP4BB list, click the following link:
>>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
>>>> 
>>>> This message was issued to members of 
>>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0,
>>>>  a mailing list hosted by 
>>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0,
>>>>  terms & conditions are available at 
>>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
>>> 
>>> ########################################################################
>>> 
>>> To unsubscribe from the CCP4BB list, click the following link:
>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
>>> 
>>> This message was issued to members of 
>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0,
>>>  a mailing list hosted by 
>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0,
>>>  terms & conditions are available at 
>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
>>> 
>>> To unsubscribe from the CCP4BB list, click the following link:
>>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>>> 
>> 
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>> 
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>> 
> 

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] To Trim or Not to To Trim

Reply via email to