In my experience the decision to model disordered atoms as missing or with high B factors has little to do with "correctness", but rather is most highly correlated with the user's favorite refinement program.  If you use phenix.refine it will, by default, tend to allow B factors to "blow up" and that tends to get rid of difference density. With refmac, however, the B factor restraints are implemented differently, using something called a K-L divergence, which I believe is based on PDB-wide statistics. This tends to restrain adjacent B factors in disordered side chains to be much more similar than you will get with phenix. You see a big red difference peak, and that inspires you to delete the offending atoms.

So, we get two classes of complaints:
1) This pdb has missing atoms, but when I add them back and refine with phenix I get better stats and there is no difference density. Silly depositors! They should have made their model more complete! 2) This pdb has atoms sticking out into nowhere, and when I refine in refmac I get big red blobs. When I delete them and re-refine I get better stats. Silly depositors! They should never have built those "non-existent" atoms!

This, in a nutshell, has been my experience.

At the risk of entering a semantics rant, I do feel I should point out that a B factor is nothing more than a width. It is not an "uncertainty" or an "error bar" any more than the ~25 nm width of the UV absorption peak of tryptophan at 280 nm is an "error bar" on how accurately you know to measure at 280 nm. The Trp absorption peak is broad, but that doesn't mean that sometimes the max is at 270 nm and sometimes its at 290 nm. It is ALWAYS at 280, just weaker away from the middle of the peak.  If you fit a curve to a Trp absorption spectrum you might use a Gaussian for this peak. The resulting best-fit curve will have a width, and a height, but will also reproducibly arrive at ~280 nm as the centroid.  In the same way, when you fit a coordinate model to density data the "fit" (aka refinement) will arrive at a width, height and midpoint. These are analogous to B, occupancy*B^(3/2), and xyz coordinates.  (Yes, the B factor reduces modeled peak height by a a factor of B raised to the power of 1.5 because the B factor has dimensions of squared length and the number of electrons must be preserved in 3D space, but that's not important right now). The "tails" of neighboring features (analogous to the peptide bond absorption at < 250 nm) are a source of error in determining the best parameters, but that is a systematic error, not a random one.  But, I digress.  Practically speaking, for highly disordered regions it is usually only the B factor restraints imposed by the refinement program that limit how high the B factor can go.

The largest representable B factor in the PDB format is 999.99, and that corresponds to a peak with full-width-half-max (FWHM) of ~8.4 Angstom.  That is to say: "this atom is wandering around here with average density of roughly Gaussian shape within a ~4 A radius, but centered >here<".  For B=100, which many programs put as an upper limit, the diameter of the density peak is ~2.7A wide.  The formula is:fwhm = sqrt(B*log(2))/pi. That is if we disregard intrinsic atomic width, which is a good approximation for large B factors.  If we want to include intrinsic atomic width, you can usually just add ~14 to the B factor (corresponding to fwhm=1 A). Or, of course, just use the tabulated atomic form factors, (which is most accurate, but harder to wrap your head around).

This "width" is all that is really being "said" by current B-factor based disorder models.  Are there better models? Yes there are. There is an "order parameter" gaining in popularity out there.  But, ultimately, a multi-conformer model is called for (IMHO). A floppy Lys or Arg side chain can easily traverse 7-8 A depending on the rotamer, and the resulting electron density of the "bouquet" of alternate locations is far from Gaussian. So, a simple B factor cannot really capture it. The bulk solvent also "fills in" when the side chain is elsewhere, making things flatter, but still more complicated. If you are worried about observations/parameters, don't be. Polynomials can give you an artificial Rwork=0 when observations=parameters, but Gaussian-based models don't. If they did, it would be easy to get Rwork=0 at 6 A resolution, where observations << parameters. To my knowledge, nobody has ever gotten Rwork=0 for a crystal structure. Not with reasonable geometry.

Hope this helps! The models we use are actually rather simple. And maybe that is the problem?

-James Holton
MAD Scientist


On 3/8/2025 2:59 AM, Italo Carugo Oliviero wrote:

If I may, I would attempt a very brief summary of this brilliant discussion.

It seems there is a general agreement on indicating, with a single parameter similar to AlphaFold's pLDDT, the reliability of the position of each atom in protein crystal structures. This would enhance the use of the PDB by non-crystallographers.

This parameter could be the positional standard error, a must in small-molecule crystallography, as described by John Helliwell for protein crystal structures. This parameter depends not only on individual atomic B-factors but also on the average B-factor.

However, it is not clear how to implement such a strategy for cryoEM structures.

P.S. Let's celebrate the *International Women's Day.*


Il giorno gio 6 mar 2025 alle ore 13:54 John R Helliwell <jrhelliw...@gmail.com> ha scritto:

    Dear Matthew,
    I suggest that if you are able to make the placements you mention
    that you do so, then calculate for the non covalent pairs of atoms
    that might be interacting (H bonds or van der Waals) coordinate
    errors and proceed as cautiously as those uncertainty estimates on
    distances suggest your narrative should be. The Online DPI
    webserver offers one such https://journals.iucr.org/paper?vg5015 a
    way forward. Or alternatively multiple workflows to look at the
    refined models based on the same diffraction dataset give you
    coordinate (and B factor) variances too.
    Also make your raw diffraction data available as well.
    Obviously I don’t know all the details, but as a referee who
    scrutinises article, with the to be released PBB files, and the
    raw data if needs be, it sounds that I would be satisfied with
    yours. A further check would be if you have replicates you could
    document those as well for the reader, attaching those data to the
    article as supplementary evidence.
    Best wishes,
    John

    Emeritus Professor John R Helliwell DSc




    On 6 Mar 2025, at 11:20, Matthew Snee
    <matthew.sne...@manchester.ac.uk> wrote:

    

    I'm sorry to everyone for still banging on about this,  I'm sure
    everyone is well and truly bored with it,  but I consider my
    structure to be useful/interesting (at least to me) despite its
    strangeness and limitations
    And I really don't want to end up in someone's "what not to do"
    slideshow at some conference or other!

    I'm trying to digest the literature about high-B factors, but it
    mostly pertains to examples where researchers are building
    unobserved features into otherwhise strong structures, the
    assertion being that they have no exerimental evidence (imprecise
    or otherwise) to inform the position of these atoms.

    I believe mine is an edge case where the Scaling B factor and the
    refinement B factors are (inappropriately but possibly
    unavoidably) accounting for the large range of flexibility.
    The Wilson B factor therefore reaches over 100, and a whole
    mobile domain (which I estimate to have a local res more like 5
    Angstrom) has protein B factors around 200!
    I have of course verified that this domain is indeed present, and
    although its challenging you can see features that that werent in
    the search model (PTMS etc..) even prior to refinement.

    I would argue that none of the B factors/ADPs are good
    representations of the true thermal motion/vibration of the
    atoms, even the ones that are close or significantly below the
    wilson B value.
    The detail is however perfectly adequate to model sidechains and
    other features for most of the structure, and there are useful
    distinctions between the real structure and the prediction (AF3).

    Pavel had some useful advice about not using B factors to
    describe things that should be described by occupancy (even if
    you end up having less than 100% occupancy  for single modeled
    conformer of a covalent feature which is "present" more or less
    100% of the time like the glycan).
    The problem is that this principle would apply to whole domains,
    and actually the whole structure if what I think is correct.

    In a crystal with a great degree of flexibility (and a wide range
    in relative flexibility), you could argue that the presented
    conformation definitely has an occupancy of less than 1.0 I guess?

    I wouldn't be upset at the assertion that the B values are
    "wrong" and this structure should be excluded from any work
    studying B factors (or perhaps the opposite to improve the way
    disorder is accounted for), but its the claim that the atomic
    coordinates have been modelled carelessly that I would like to avoid!

    Obviously, I will discuss this openly in the publication, and
    only rely on features that are unambiguous for my conclusions,
    but it would be good to know peoples thoughts on what they would
    do or expect to see.

    Best

    Matthew.







    ------------------------------------------------------------------------
    *From:* Karplus, Andy <andy.karp...@oregonstate.edu>
    *Sent:* 06 March 2025 01:02
    *To:* Matthew Snee <matthew.sne...@manchester.ac.uk>;
    CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
    *Subject:* Re: [ccp4bb] IDS in PDB

    Hi Matthew.

    Your post reminds me of some work my student did earlier related
    to the question of what to consider as “too high a B-factor.”
    Back in 2003 a referee raised concerns about the “way too high”
    B-factors of a 2.6 Å resolution structure we had determined,
    because the average B-factor of the structure was about 85 Å^2 .
    Even though the density for the modeled parts of the structure
    was quite clear that was a red flag for the reviewer.

    In response we provided a variety of evidences to validate the
    structure, and then also briefly explored whether the structure
    having an average B-factor much higher than was then considered
    reasonable for a structure at 2.6 Å resolution could be at least
    in part related to our use of a synchrotron source for the data
    collection. Our thought was that the greater X-ray intensity
    could lead to data with notable signal out to 2.6 Å resolution
    for a crystal that perhaps in a lab setting might have only
    yielded data with notable signal out to a much lower resolution.

    To test this idea, we looked at PDB structures and generated a
    plot comparing the average B-factor for structures determined
    using a lab source versus a synchrotron source. There was a
    difference, with distribution of structures based on synchrotron
    radiation notably shifted to higher average B-factors (extending
    up to ~100 Å^2 compared with ~70 Å^2 ).  And crucial to remember
    when looking at this plot is that for many (perhaps most)
    structures determined at say 2.5 – 3.0 Å resolution, the
    resolution limit at which the structure was determined need not
    reflect the limit to which useful data could have been collected.
    For instance, for a protein crystal with a size and level of
    internal order that would allow for meaningful data to be
    collected out to 1 Å resolution, researchers looking at a series
    of ligand bound structures may choose to collect them all at 2.5
    Å resolution, knowing that that is much faster and provides
    enough information to answer the questions they are asking; and
    these structure would give very low refined B-factors compared
    with a structure from a large crystal with a level of internal
    order that truly leads to 2.5 Å as the limit of resolution to
    which useful data can be collected under the best circumstances.

    The analysis I’m referring to is in Figure 2D of the paper at
    doi:10.1016/S0022-2836(03)00347-4 .

    HTH, Andy

    Dr. P. Andrew Karplus (/he, him, his)/

    Distinguished Professor Emeritus of Biochemistry and Biophysics

    NIGMS GCE4All Research Center [gce4all.oregonstate.edu]
    
<https://urldefense.com/v3/__http://gce4all.oregonstate.edu/__;!!PDiH4ENfjr2_Jw!ClUS-PQFSR3O42K95NZuZXa55AmbpFncgAsJv7vrQv_dV5VQZLITPkWTNbGnXm2gaqkFtfnL-rrdypo59cvHEsqiX3ww6jUY8rSxQDNBpg$>
 Director
    of Communications

    2133 ALS Building

    Oregon State University

    Corvallis, OR 97331

    ph. 541-737-3200

    andy.karp...@oregonstate.edu <mailto:andy.karp...@oregonstate.edu>

    /“Revealing how life works for the benefit of all!”/

    http://biochem.oregonstate.edu [biochem.oregonstate.edu]
    
<https://urldefense.com/v3/__http://biochem.oregonstate.edu/__;!!PDiH4ENfjr2_Jw!ClUS-PQFSR3O42K95NZuZXa55AmbpFncgAsJv7vrQv_dV5VQZLITPkWTNbGnXm2gaqkFtfnL-rrdypo59cvHEsqiX3ww6jUY8rSULgaHFw$>/

    *From: *CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of
    Matthew Snee <matthew.sne...@manchester.ac.uk>
    *Date: *Wednesday, March 5, 2025 at 11:19 AM
    *To: *CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
    *Subject: *Re: [ccp4bb] IDS in PDB

    [This email originated from outside of OSU. Use caution with
    links and attachments.]

    Thankyou for those links

    One of those papers seems to basically say that atoms that
    accumulate stratospheric B factors are "speculative", but Im
    working on an example with a Wilson B factor of over 100 where
    some of the weaker features have enormous B values.

    image.png

    The crystal is very atypical, very high solvent, and a very wide
    range of flexibility  where whole domains can move almost freely
    in solvent channels, but are still observed.

    The features that have obtained super high B factors are the
    glycans. The general shape of the core Glycan can be seen
    clearly, but the conformation is not constrained by any contacts,
    so the density is extremely diffuse.

    The "answer" is obviously that there is an ensemble of
    conformations (for these features and also a whole domain), which
    would exist at low occupancy but would also accrue a lower B
    factor, but I think it is more appropriate to model a single
    favourable conformer that fits the density.

    This is certainly the first X-ray structure that I ever worked on
    that was quite like this, so id like to hear peoples opinions on
    how I should handle it.  The values out of context would
    certainly raise eyebrows!

    I work quite a bit on EM and it appears really similar to some EM
    examples with resolution ranges between 2.8-6A.

    I am certain that the features with the very high values are much
    more than speculative, and the model is more accurate when
    complete, but the B factors are certainly not describing the
    relationship between the model and the data in a useful way!

    Best wishes

    Matthew.

    ------------------------------------------------------------------------

    *From:*CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of
    Italo Carugo Oliviero <olivieroitalo.car...@unipv.it>
    *Sent:* 05 March 2025 16:21
    *To:* CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
    *Subject:* Re: [ccp4bb] IDS in PDB

    Just wantedtothank you for yourremarkablecontributions
    tothisdiscussion.

    These area couple of articles that dive into the issue of
    unusually large B-factors: BMC Bioinformatics 2018 19 61
    (https://doi.org/10.1186/s12859-018-2083-8 [doi.org]
    
<https://urldefense.com/v3/__https://doi.org/10.1186/s12859-018-2083-8__;!!PDiH4ENfjr2_Jw!GvXajeqHyc78oG7fUDHacPe5o0b-HFbzBJLEkYOApVm5CM5RsNpkyhm68SnJnELJZ8zxhNWbJBcraWaoE7qkEaKzveL3tjJenPIJ8Wkn7A$>)
    & Zeit. Krist. 2018 234 73-77
    (https://doi.org/10.1515/zkri-2018-2057 [doi.org]
    
<https://urldefense.com/v3/__https://doi.org/10.1515/zkri-2018-2057__;!!PDiH4ENfjr2_Jw!GvXajeqHyc78oG7fUDHacPe5o0b-HFbzBJLEkYOApVm5CM5RsNpkyhm68SnJnELJZ8zxhNWbJBcraWaoE7qkEaKzveL3tjJenPIIucbmag$>).

    Best,

    Oliviero

    Il giorno mar 4 mar 2025 alle ore 14:03 Frank von Delft
    <0000bcb385fe5582-dmarc-requ...@jiscmail.ac.uk
    <mailto:0000bcb385fe5582-dmarc-requ...@jiscmail.ac.uk>> ha scritto:

        Interesting...

        Has this got onto the radar (or critical path) of the PDB's
        mmCIF working group (or whatever it's called?)

        I'm assuming that's where this would go to next, if the
        downstream developers are ever going to take it seriously.

        Frank


        On 04/03/2025 12:21, Alexandre Ourjoumtsev wrote:

            Dear all,

            Fully relevant to this discussion, you might noted that a
            couple of years ago, we (Vladimir Lunin and myself) argued

            https://journals.iucr.org/m/issues/2022/06/00/tf5001/
            [journals.iucr.org]
            
<https://urldefense.com/v3/__https://journals.iucr.org/m/issues/2022/06/00/tf5001/__;!!PDiH4ENfjr2_Jw!GvXajeqHyc78oG7fUDHacPe5o0b-HFbzBJLEkYOApVm5CM5RsNpkyhm68SnJnELJZ8zxhNWbJBcraWaoE7qkEaKzveL3tjJenPJrQbXHqQ$>

            that, when describing an atomic model, each atom should
            have one more parameter, namely a local resolution with
            which it contributes to the map from which it has been
            identified - or, in other words, with which value its
            image should be calculated to reproduce the experimental
            map (and NOT the density / potential itself) as a sum of
            atomic contributions (different atoms may have different
            local resolution).

            Indicating the local resolution large (and neither
            B-factors large nor occupancy small) means exactly that
            one cannot localize it in this given map; again in other
            words, that the map from which this part of the model was
            constructed had not enough information.

            Naturally, cif-format has no obstacle to complete the
            model description by the local resolution value
            associated to each individual atom. Moreover, even the
            old good PDB format has a space for this; positions 67-72
            have been reserved :-)

            Going behind the current discussion, as you perfectly
            know, both B-factors and resolution cut-off blur atomic
            images; however they do it in a different way (Ezra
            already mentioned Fourier ripples). Considering this new
            parameter allows one distingushing these two effects and
            even to identify (fix?) some errors occured when using
            the current, conventional procedures : see, for example,
            Lunin et al. in Current Research in Structural Biology
            (2023) :

            https://doi.org/10.1016/j.crstbi.2023.100102 [doi.org]
            
<https://urldefense.com/v3/__https://doi.org/10.1016/j.crstbi.2023.100102__;!!PDiH4ENfjr2_Jw!GvXajeqHyc78oG7fUDHacPe5o0b-HFbzBJLEkYOApVm5CM5RsNpkyhm68SnJnELJZ8zxhNWbJBcraWaoE7qkEaKzveL3tjJenPJqf5yqcw$>

            There is a couple more of relevant articles, in Acta D
            and J.Appl.Cryst, and there are works in progress.

            I hope, this helps...

            Have a nice day

            Sacha Urzhumtsev

            
------------------------------------------------------------------------

            To unsubscribe from the CCP4BB list, click the following
            link:
            https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
            [jiscmail.ac.uk]
            
<https://urldefense.com/v3/__https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1__;!!PDiH4ENfjr2_Jw!GvXajeqHyc78oG7fUDHacPe5o0b-HFbzBJLEkYOApVm5CM5RsNpkyhm68SnJnELJZ8zxhNWbJBcraWaoE7qkEaKzveL3tjJenPIr_83TBw$>


        ------------------------------------------------------------------------

        To unsubscribe from the CCP4BB list, click the following link:
        https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
        [jiscmail.ac.uk]
        
<https://urldefense.com/v3/__https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1__;!!PDiH4ENfjr2_Jw!GvXajeqHyc78oG7fUDHacPe5o0b-HFbzBJLEkYOApVm5CM5RsNpkyhm68SnJnELJZ8zxhNWbJBcraWaoE7qkEaKzveL3tjJenPIr_83TBw$>


    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    [jiscmail.ac.uk]
    
<https://urldefense.com/v3/__https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1__;!!PDiH4ENfjr2_Jw!GvXajeqHyc78oG7fUDHacPe5o0b-HFbzBJLEkYOApVm5CM5RsNpkyhm68SnJnELJZ8zxhNWbJBcraWaoE7qkEaKzveL3tjJenPIr_83TBw$>


    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    [jiscmail.ac.uk]
    
<https://urldefense.com/v3/__https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1__;!!PDiH4ENfjr2_Jw!ClUS-PQFSR3O42K95NZuZXa55AmbpFncgAsJv7vrQv_dV5VQZLITPkWTNbGnXm2gaqkFtfnL-rrdypo59cvHEsqiX3ww6jUY8rQJPt8SMg$>



    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to