Well, of all the possible metrics you could use to asses data quality
Rfree is probably the worst one. This is because it is a
cross-validation metric, and cross-validations don't work if you use
them as an optimization target. You can try, and might even make a
little headway, but then your free set is burnt. If you have a third set
of observations, as suggested for Rsleep
(doi:10.1107/S0907444907033458), then you have a chance at another round
of cross-validation. Crystallographers don't usually do this, but it has
become standard practice in machine learning (training=Rwork,
validation=Rfree and testing=Rsleep).
So, unless you have an Rsleep set, any time you contemplate doing a
bunch of random things and picking the best Rfree ... don't. Just
don't. There madness lies.
What happens after doing this is you will be initially happy about your
lower Rfree, but everything you do after that will make it go up more
than it would have had you not performed your Rfree optimization. This
is because the changes in the data that made Rfree randomly better was
actually noise, and as the structure becomes more correct it will move
away from that noise. It's always better to optimize on something else,
and then check your Rfree as infrequently as possible. Remember it is
the control for your experiment. Never mix your positive control with
your sample.
As for the best metric to assess data quality? Well, what are you doing
with the data? There are always compromises in data processing and
reduction that favor one application over another. If this is a "I just
want the structure" project, then score on the resolution where CC1/2
hits your favorite value. For some that is 0.5, others 0.3. I tend to
use 0.0 so I can cut it later without re-processing. Whatever you do
just make it consistent.
If its for anomalous, score on CCanom or if that's too noisy the
Imean/sigma in the lowest-angle resolution or highest-intensity bin.
This is because for anomalous you want to minimize relative error. The
end-all-be-all of anomalous signal strength is the phased anomalous
difference Fourier. You need phases to do one, but if you have a
structure just omit an anomalous scatterer of interest, refine to
convergence, and then measure the peak height at the position of the
omitted anomalous atom. Instructions for doing anomalous refinement in
refmac5 are here:
https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/refmac_keywords.html
If you're looking for a ligand you probably want isomorphism, and in
that case refining with a reference structure looking for low Rwork is
not a bad strategy. This will tend to select for crystals containing a
molecule that looks like the one you are refining. But be careful! If
it is an apo structure your ligand-bound crystals will have higher Rwork
due to the very difference density you are looking for.
But if its the same data just being processed in different ways, first
make a choice about what you are interested in, and then optimize on
that. just don't optimize on Rfree!
-James Holton
MAD Scientist
On 10/27/2021 8:44 AM, Murpholino Peligro wrote:
Let's say I ran autoproc with different combinations of options for a
specific dataset, producing dozens of different (but not so different)
mtz files...
Then I ran phenix.refine with the same options for the same structure
but with all my mtz zoo
What would be the best metric to say "hey this combo works the best!"?
R-free?
Thanks
M. Peligro
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/