Re: [ccp4bb] [EXTERNAL] Re: [ccp4bb] How high a B factor is too high to assume a loop is in place, in the AlphaFold era?

Eleanor Dodson Fri, 02 Aug 2024 10:22:14 -0700

Cutting data back is a BAD THING.. If the information is not provided no
refinement program can use it...
Especially for B factor estimation it is the high resolution data that
indicates AlphaHelix1 is better positioned than SurfaceLoop 3...
E



On Fri, 2 Aug 2024 at 17:19, Reza Khayat <rkha...@ccny.cuny.edu> wrote:

> With regards to the image that Bohdan sent and Eleanor's statements, I'm
> curious if  the splitting of B-factors in Bohdan's image is due to the
> increased amount of data (which may diminish the extent of uncertainty),
> due to the diminished ensemble of structures within the crystal, or both.
> What happens to the B-factors of a structure that was derived from a
> 1Angstrom data set if you reduce the amount of data to 3Angstrom. In other
> words, you are diminishing the amount of data but not affecting the
> ensemble of structures that define the crystal. Perhaps I'm way off on this
> one....
>
> Best wishes,
> Reza
> ------------------------------
> *From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of John R
> Helliwell <jrhelliw...@gmail.com>
> *Sent:* 02 August 2024 11:47 AM
> *To:* CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
> *Subject:* [EXTERNAL] Re: [ccp4bb] How high a B factor is too high to
> assume a loop is in place, in the AlphaFold era?
>
> Dear Colleagues,
> I think this paper from 1979 is still very interesting:-
> Crystallographic studies of the dynamic properties of lysozyme
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nature.com_articles_280563a0&d=DwMFaQ&c=4NmamNZG3KTnUCoC6InoLJ6KV1tbVKrkZXHRwtIMGmo&r=1DzJFW0v6TgEhkW1gy_-ke-RbtvS1fzEbD5_hcb9Up0&m=_Me4Xe5QZbbYN_GNBrKXdDe2jPv25n-V7XAp03Qcx-XmE9JutFEcl_X81WALv787&s=jC_Z5R86pF5k_iS5FpD1922HfoZySK0czqxWXOR8Gag&e=>
> nature.com
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nature.com_articles_280563a0&d=DwMFaQ&c=4NmamNZG3KTnUCoC6InoLJ6KV1tbVKrkZXHRwtIMGmo&r=1DzJFW0v6TgEhkW1gy_-ke-RbtvS1fzEbD5_hcb9Up0&m=_Me4Xe5QZbbYN_GNBrKXdDe2jPv25n-V7XAp03Qcx-XmE9JutFEcl_X81WALv787&s=jC_Z5R86pF5k_iS5FpD1922HfoZySK0czqxWXOR8Gag&e=>
> [image: apple-touch-icon-f39cb19454.png]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nature.com_articles_280563a0&d=DwMFaQ&c=4NmamNZG3KTnUCoC6InoLJ6KV1tbVKrkZXHRwtIMGmo&r=1DzJFW0v6TgEhkW1gy_-ke-RbtvS1fzEbD5_hcb9Up0&m=_Me4Xe5QZbbYN_GNBrKXdDe2jPv25n-V7XAp03Qcx-XmE9JutFEcl_X81WALv787&s=jC_Z5R86pF5k_iS5FpD1922HfoZySK0czqxWXOR8Gag&e=>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nature.com_articles_280563a0&d=DwMFaQ&c=4NmamNZG3KTnUCoC6InoLJ6KV1tbVKrkZXHRwtIMGmo&r=1DzJFW0v6TgEhkW1gy_-ke-RbtvS1fzEbD5_hcb9Up0&m=_Me4Xe5QZbbYN_GNBrKXdDe2jPv25n-V7XAp03Qcx-XmE9JutFEcl_X81WALv787&s=jC_Z5R86pF5k_iS5FpD1922HfoZySK0czqxWXOR8Gag&e=>
> Have a great weekend,
> John
>
> Emeritus Professor John R Helliwell DSc
>
>
>
>
> On 2 Aug 2024, at 16:29, Bohdan Schneider <bohdan.schnei...@gmail.com>
> wrote:
>
> Hello:
>
> yes, a great discussion! I second Eleanor's statement that B-factors of
> high resolution structures do carry a message about atom flexibility. I
> attach a screenshot of a figure from our paper (Schneider et al.: Local
> dynamics of proteins and DNA evaluated from crystallographic B factors,
> Acta Cryst. (2014). D70, 2413–2419) that shows clear resolution dependence
> of B factors at protein/protein interface for amino acids and waters. Our
> high resolution group of structures could not be below 1 Å as Eleanor
> suggests but even modest limit to 1.9 Å and then structures at 1.9-2.5 and
> 2.5-3.0 show the effect clearly. We looked at several other groups of atoms
> (backbone/side chains at the protein core, at the protein surface, DNA
> phosphates/bases, waters at the interfaces or bound on the protein surface)
> and saw the same dependence.
>
> Best,
>
> Bohdan, bs.structbio.org
>
> On 2024-08-02 13:26, Eleanor Dodson wrote:
>
> All interesting points.. (And good to see a reference to /" P.A. Machin,
> J.W. Campbell, M. Elder (Eds)
>
> Refinement of Protein Structures, SERC Daresbury Laboratory, Warrington,
> UK (1980)"/
>
> - for those who remember, a super exciting discussion over what was
> feasible for refinement, and how to do it! )
>
> My take - if a crystal diffracts to 1A we can be fairly sure of the
> accurate position of most of the coordinates, see other conformations for
> some regions, and give realistic B values to most atoms.
>
> If the crystal only diffracts to 3A then the lattice is not perfect, and
> there must be multiple conformations for lots of the molecule.
>
> There is not going to be sufficient experimental data to model this
> properly so every parameter assuming a single conformer - coordinate, B
> value, occupancy - is an approximation. Restraints help to some extent but
> they impose prior knowledge and do not glean information from the
> experimental data.
>
> The "trash can" should indicate the degree of uncertainty and interpreting
> that is a bit problematic.  B values twice the overall B ?? Hmm-  do NOT
> base too much faith in that part of the model.. As crystallographers I
> think maybe we need to flag this better for trusting users of the
> information. Omitting that region? I am not sure .. How do others model
> those floppy lysines? I usually make a sort of informed guess but indeed
> giving a single conformation is not the truth, the whole truth, and nothing
> but the truth..
>
> On Fri, 2 Aug 2024 at 01:14, James Holton <jmhol...@lbl.gov> wrote:
>
>    __
>
>    I submit that modern B factor restraints make them much less trashy
>
>    than they were in the early days.  As Pavel points out the exact
>
>    strategies differ from program to program, but I don't think anybody
>
>    does unrestrained B factor refinement. Not by default.
>
>    Besides, all we are really doing is fitting Gaussian-shaped peaks to
>
>    the "curve" of the data.  These peaks have a width and a height. For
>
>    example, a carbon atom with B=20 has a peak density of 1.6 e-/A^3
>
>    and a full-width-at-half-max (FWHM) of 1.4 A.  That is it! That is
>
>    the model density being fit. If you increase to B=80 the peak drops
>
>    to 0.3 e-/A^3 and the FWHM increases to 2.6 A.  At the largest B you
>
>    can stuff into a PDB file (999.99), the peak height is 0.008 e-/A^3
>
>    and the "peak" is 8.45A wide. Your disordered loop, however, is
>
>    probably not sampling from a symmetric Gaussian distribution like
>
>    that. This is the real problem with large B factors. They can fit
>
>    better than sharper B atoms, but that doesn't mean they fit well.
>
>    Occupancy is easy because all it does is scale the height without
>
>    affecting the width.  So, an 0.5 occupancy atom model is half the
>
>    height of a full-occupancy one.  The width is unchanged.  B factors
>
>    impact both width and height because they must preserve the number
>
>    of electrons in the peak.  This is perhaps why they are often
>
>    confusing and mysterious.  We should also never forget that bulk
>
>    solvent gets excluded with exactly the same radii rules from every
>
>    modeled atom, regardless of B factor and occupancy.  So, the "change
>
>    in density" from adding or deleting an atom is a little more
>
>    complicated than adding or subtracting a Gaussian peak.
>
>    Nevertheless, if you want to fit peak height and width independently
>
>    (like we do in pretty much every other kind of curve fitting), then
>
>    you should refine occupancy and B factors at the same time.
>
>    Over-fitting you say?  Hardly. Polynomials are easy to over-fit, but
>
>    not Gaussians. Observations/parameters is a useful guide for
>
>    polynomial fits, but in general the hallmark of over-fitting is that
>
>    the prediction passes exactly through all the observed points (and
>
>    not the cross-validation or "Rfree" points). I have never seen a
>
>    macromolecular refinement end up with Rwork = 0.  Have you?
>
>    At the end of the day, what we do with our models is look at their
>
>    parameters and try to extract the physically meaningful reality they
>
>    are trying to capture. Restraints are very helpful in preventing
>
>    many types of unrealistic situations, but ultimately it is up to you
>
>    to decide if the fitted model makes sense.
>
>    -James Holton
>
>    MAD Scientist
>
>    On 7/30/2024 11:30 AM, Ian Tickle wrote:
>
>
>    Obviously no refined parameters can ever be completely error-free,
>
>    it's just that for the co-ordinates we have very accurate
>
>    geometric restraints so that the relative uncertainty in the
>
>    refined co-ordinates is small (but try refining co-ordinates
>
>    without restraints!).  For the B factors we don't have accurate
>
>    estimates (if any) for their restraints so their relative
>
>    uncertainty after refinement is much greater.
>
>
>    -- Ian
>
>
>
>    On Tue, Jul 30, 2024 at 6:57 PM Oganesyan, Vaheh <
> vaheh.oganes...@astrazeneca.com> wrote:
>
>
>        Yes, it is and I like the definition of shared “trash bin”. It
>
>        will have more physical meaning if we can separate those
>
>        contributions into separate bins.
>
>
>        Vaheh
>
>
>        *From:* Pavel Afonine <pafon...@gmail.com
>
>        <mailto:pafon...@gmail.com>>
>
>        *Sent:* Tuesday, July 30, 2024 1:51 PM
>
>        *To:* Oganesyan, Vaheh <vaheh.oganes...@astrazeneca.com
>
>        <mailto:vaheh.oganes...@astrazeneca.com>>
>
>        *Cc:* CCP4BB@jiscmail.ac.uk <mailto:CCP4BB@jiscmail.ac.uk>
>
>        *Subject:* Re: [ccp4bb] How high a B factor is too high to
>
>        assume a loop is in place, in the AlphaFold era?
>
>
>        Vaheh,
>
>
>        I think coordinates are no different from B factors,
>
>        occupancies, f', or f'' in this respect. Coordinates can play
>
>        their "trash bin" role by adjusting to the noise at the
>
>        expense of violated geometry (bonds, angles, planes, torsions,
>
>        etc.). As I mentioned in my previous email, their trash bin
>
>        capacity is much smaller (but definitely not zero!) because
>
>        the number and strength (confidence) of geometry restraints
>
>        are much greater than those of ADP restraints.
>
>
>        I agree that all refined parameters share this trash bin
>
>        capacity, but to varying degrees. Isn't this essentially what
>
>        we call the error on the refined parameter? All refined
>
>        parameters have their error bars, which we have referred to as
>
>        the "trash bin" in this thread.
>
>
>        Pavel
>
>
>        On Tue, Jul 30, 2024 at 10:09 AM Oganesyan, Vaheh
>
>        <vaheh.oganes...@astrazeneca.com> wrote:
>
>
>            Your point is taken, Pavel. However, despite resolution,
>
>            you define coordinate of the atom as a geometric point
>
>            with no width. Although coordinates are “refineable”, they
>
>            have no capacity for “trash”. Their “trash” still goes
>
>            into B-factor “trash bin”. At least this is how I see it.
>
>
>            Thank you.
>
>
>            *Vaheh Oganesyan, Ph.D.*
>
>            *R&D **| Biologics Engineering*
>
>            One Medimmune Way, Gaithersburg, MD 20878
>
>            T:  301-398-5851
>
>            _vaheh.oganes...@astrazeneca.com
>
>
>            *From:* Pavel Afonine <pafon...@gmail.com>>
>
>            *Sent:* Tuesday, July 30, 2024 11:45 AM
>
>            *To:* Oganesyan, Vaheh <vaheh.oganes...@astrazeneca.com>
>
>            *Cc:* CCP4BB@jiscmail.ac.uk <mailto:CCP4BB@jiscmail.ac.uk>
>
>            *Subject:* Re: [ccp4bb] How high a B factor is too high to
>
>            assume a loop is in place, in the AlphaFold era?
>
>
>            From this perspective, all refinable atomic model
>
>            parameters can be viewed as trash bins, with the size of
>
>            these bins being proportional to the amount of prior
>
>            information (restraints) imposed on these parameters. For
>
>            example, coordinates have the most restraints and thus are
>
>            the smallest trash bins, while B factors have the least
>
>            restraints and thus are one of the largest bins.
>
>
>            Pavel
>
>
>            On Tue, Jul 30, 2024 at 8:25 AM Oganesyan, Vaheh
>
>            <vaheh.oganes...@astrazeneca.com> wrote:
>
>
>                Early in my Crystallography life I was postdoc with
>
>                Robert Huber in Munich. We had those gatherings once a
>
>                week when in very informal way we can ask and answer
>
>                questions. I remember my question about B factors: how
>
>                is it possible to have high resolution structure and
>
>                average B-factor of 100A^2 . I think it was Robert or
>
>                Albrecht Messerschmidt who told that B-factor is a
>
>                “trash can” that describes not only loosely positioned
>
>                atoms but also all other problems that either you
>
>                created during processing, harvesting or crystal had
>
>                from the beginning.
>
>
>                *Vaheh Oganesyan, Ph.D.*
>
>                *R&D **| Biologics Engineering*
>
>                One Medimmune Way, Gaithersburg, MD 20878
>
>                T:  301-398-5851
>
>                _vaheh.oganes...@astrazeneca.com
>
>
>                *From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> *On
> Behalf Of *James
>
>                Holton
>
>                *Sent:* Tuesday, July 30, 2024 10:35 AM
>
>                *To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
>
>                *Subject:* Re: [ccp4bb] How high a B factor is too
>
>                high to assume a loop is in place, in the AlphaFold era?
>
>
>                How high B factors can go depends on the refinement
>
>                program you are using.
>
>
>                In fact, my impression is that the division between
>
>                the "let the B factors blow up" and "delete the
>
>                unseen" camps is correlated to their preferred
>
>                refinement program. You see, phenix.refine is
>
>                relatively aggressive with B factor refinement, and
>
>                will allow "missing" atoms to attain very high B
>
>                factors. Refmac, on the other hand, has restraints
>
>                that try to make B factor distributions look like
>
>                those found in the PDB, and so tends to keep nearby B
>
>                factors similar. As a result, you may get "red
>
>                density" for disordered regions from refmac, inviting
>
>                you to delete the offending atoms, but not from
>
>                phenix, which will raise the B factor until the
>
>                density fits.
>
>
>                Then there are programs like VagaBond that don't
>
>                formally have B factors, but rather let an ensemble of
>
>                chains spread out in the loopy regions you are
>
>                concerned about.  This might be the way to go?
>
>
>                You can also do ensemble refinement in the latest
>
>                Amber. That is, you run an MD simulation of a unit
>
>                cell (or more) and gradually increase structure factor
>
>                restraints. This would probably result in the "fan" of
>
>                loops you have in mind?
>
>
>                -James Holton
>
>                MAD Scientist
>
>
>                On 7/28/2024 8:13 AM, Javier Gonzalez wrote:
>
>                    Dear CCP4bb,
>
>
>                    I'm refining the ~3A crystal structure of a big
>
>                    protein, largely composed of alpha helices
>
>                    connected by poorly-resolved loops.
>
>
>                    In the old pre-AlphaFold (AF) days I used to
>
>                    simply remove those loops/regions with too high B
>
>                    factors, because there was little to none density
>
>                    at 1 sigma in a 2Fo-Fc map.
>
>
>                    However, considering that the quality of a
>
>                    readily-computable AF model is comparable to a 3A
>
>                    experimental structure, and that the UniProt
>
>                    database is flooded with noodle-like AF models, I
>
>                    was considering depositing a combined model in the
>
>                    PDB.
>
>
>                    Once R/Rfree reach a minimum for the model
>
>                    truncated in poorly resolved loops, I would
>
>                    calculate an augmented model with AF calculated
>
>                    missing regions (provided they have an acceptable
>
>                    pLDDT value), assign them zero occupancy, and run
>
>                    only one cycle of refinement to calculate the
>
>                    formal refinement statistics.
>
>
>                    Would that be acceptable? Has anyone tried a
>
>                    similar approach?
>
>
>                    I'd rather do that instead of depositing a
>
>                    counterintuitive model with truncated regions that
>
>                    few people would find useful!!
>
>
>                    Thank you for your comments,
>
>
>                    Javier
>
>                    --                     Dr. Javier M. González
>
>                    Instituto de Bionanotecnología del NOA
>
>                    (INBIONATEC-CONICET)
>
>                    Universidad Nacional de Santiago del Estero (UNSE)
>
>                    RN9, Km 1125. Villa El Zanjón. (G4206XCP)
>
>                    Santiago del Estero. Argentina
>
>
>                    Tel: +54-(0385)-4238352
>
>
> ########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
> available at https://www.jiscmail.ac.uk/policyandsecurity/
> <Figure_2_Acta Cryst. (2014).D70,2413.png>
>
>
> ------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_WA-2DJISC.exe-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=4NmamNZG3KTnUCoC6InoLJ6KV1tbVKrkZXHRwtIMGmo&r=1DzJFW0v6TgEhkW1gy_-ke-RbtvS1fzEbD5_hcb9Up0&m=_Me4Xe5QZbbYN_GNBrKXdDe2jPv25n-V7XAp03Qcx-XmE9JutFEcl_X81WALv787&s=bCUX91_1eGn3_kNtEpo2a8v3oVEZUu02yUMmM0A8z1E&e=>
>
> ------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] [EXTERNAL] Re: [ccp4bb] How high a B factor is too high to assume a loop is in place, in the AlphaFold era?

Reply via email to