Re: [gmx-users] Very large fluctuations in dg/dl

Patrick Fuchs Wed, 09 May 2007 09:14:33 -0700

Hi John and David,

I reply to you both. Thank you very much for your complete answers aswell as for all the pointers you mentioned. I will have a look at thesereferences. Free energy calculation is definitely a challenging issue !

Cheers,


Patrick

John D. Chodera a écrit :

Hi Patrick,
I like your plots. They nicely demonstrate the difficulty ofconvergence of the estimate of <dg/dl>.
It looks like there may be some other oddities in the plot, such asswitching between conformations or some other effect that has a longcorrelation time. In particular, at lambda = 0.55, there is a bigswitch about 500 ps in.
We (David Mobley and I) have found it helpful to examine the dg/dltimeseries plots directly to look for the potential presence of aninitial non-equilibrated state (which would look like dg/dl spendingtime near some value A_1 and then switching to fluctuate about A_2 forthe remainder of the simulation) or multiple states with a longcorrelation time (which would look like hops between fluctuations aboutdifferent values). In this case, long correlation times may require youto run much longer.
One of the best diagnostic tools besides examination of the timeseriesis to compute the correlation time and statistical inefficiency for eachdg/dl timeseries. That Janke article I mentioned previously describeshow to do this, and is available online here:
http://www.fz-juelich.de/nic-series/volume10/janke2.pdf
Also, there is a discussion of how to do so efficiently in my recentpaper on the analysis of parallel tempering simulations using WHAM (seeSections 2.4 and 5.2):
http://www.dillgroup.ucsf.edu/~jchodera/pubs/pdf/replica-exchange-wham.pdf
The method I described for computing the uncertainties is termed"correlation analysis", as it relies on computation (and integration) ofthe autocorrelation function for the timeseries. This was first appliedto molecular dynamics simulations by Bill Swope when he was with HansAndersen:
W. C. Swope, H. C. Andersen, P. H. Berens, and K. R. Wilson. A computersimulation method for the calculation of equilibrium constants for theformation of physical clusters of molecules: Application to small waterclusters. J. Chem. Phys., 76(1):637–649, 1982.
(This is actually the same paper where he introduces the velocity Verletintegrator -- it's a good read!)
Block averaging should give equivalent uncertainty estimates to thecorrelation analysis method (to about an order of magnitude) if theblock sizes are chosen appropriately, but it usually requires eithercalculation of the statistical inefficiency to determine the block sizefirst, or application of an iterative method like that of Flyvbjerg(cited in my WHAM paper) to determine the statistical uncertainty fromconsideration of many block sizes. Wolfhard Janke's paper does a greatjob of discussing how various methods for estimating the statisticaluncertainty compare.
Finally, if there are unequilibrated regions of your dataset, there isnow a method to automatically determine the boundary betweenunequilibrated and equilibrated regions:W. Yang, R. Bitetti-Putzer, and M. Karplus. Free energy simulations: Useof reverse cumulative averaging to determine the equilibrated region andthe time required for convergence. J. Chem. Phys., 120(6):2618–2628, 2004.
To address your observation that some authors use only a few hundred psof simulation time, while your system seems to require at least 1 ns:Different systems have different correlation times and variances ofdg/dl, both of which affect the statistical uncertainty. Some systemsare much "easier" to converge than others. David Mobley has found thatapplication of restraints in free energy calculations which are laterremoved can actually transform a "very hard" problem into an "easy"problem -- see the publication below. But the basic answer is that thisis why it is extremely important to *always* compute the correlationtime for whatever it is you are averaging -- this may be very differentfrom system to system, or even from lambda value to lambda value.
D. L. Mobley, J. D. Chodera, and K. A. Dill, "Confine and Release:Obtaining Correct Binding Free Energies in the Presence of ProteinConformational Change", accepted, Journal of Chemical Theory andComputation.
http://www.dillgroup.ucsf.edu/~dmobley/papers/flex.pdf

Best of luck!

- John

--
John Chodera <[EMAIL PROTECTED]>             | Mobile    : 415 867-7384
Postdoctoral researcher, Pande lab            | Lab phone : 650.723.1097
Department of Chemistry, Stanford University  | Lab fax   : 650.724.4021
http://www.dillgroup.ucsf.edu/~jchodera


On May 8, 2007, at 3.20 AM, Patrick Fuchs wrote:
Hi John,
thanks a lot for your reply.
Indeed, the standard deviation I presented in my previous post is theone of dg/dl samples. I was just surprised by the fact the std. dev.is always larger than the value itself (since I'm starting with FEcalculation I had no expectation of what the behavior would be andneeded a confirmation).I was also suprised about the convergence of the mean. I put a plotfor each lambda value at the URL:
http://condor.ebgm.jussieu.fr/~fuchs/download/convergence.png
(at each time step, I recalculate the mean from the beginning). Myfirst observation was that 1 ns seemed to be a minimum for certainlambda values(e.g. lambda=0.70). I sometimes read in literature that some authorsused a few hundreds of ps, which seemed (to me) not sufficient forproper convergence.Now, if we come back to the error estimate of the mean, I found (forlambda=0.00) 0.2 kJ/mol using block averaging (using the -ee option ofg_analyze), which is reasonable I imagine (even if higher precisionshave been described in literature). I'm not sure whether this is the same
way of calculating the uncertainty compared to what you proposed.
Can you confirm?
I will have a look to the book you mentioned, thanks for the pointer.
Cheers,

Patrick

On Mon, 7 May 2007, John D. Chodera wrote:
Hi Patrick,
I find a reasonable DeltaGsol value of 8.6 kJ/mol for methane (compared
to 8.7 in Geerke & van Gunsteren, ChemPhysChem 2006, 7, 671 ? 678) but
I get really huge fluctuations in the values of dg/dl:
lambda=0.00: 5.0 +/- 10.8 (mean +/- standard deviation)
lambda=0.05: 4.3 +/- 11.2
Is the standard deviation you quote here the standard deviation ofthe dg/dl samples, or the standard deviation of the mean?
If the former, then this behavior is totally expected: While thestandard deviation of a random variable (your dg/dl samples) may belarge, with enough sampling, we can get a very precise estimate ofthe mean. More sampling will not change the standard deviation ofthe dg/dl samples, but it will reduce the standard error in the mean,which is what we need for precise estimates of free energy differences.
The uncertainty in the estimate of <dg/dl> is given simply by

d<dg/dl> = sigma / sqrt(N / g)
where here, sigma is the standard deviation of your dg/dl samples, Nis the number of data points you have collected, and g is somethingcalled the "statistical inefficiency", which can be estimated fromthe correlation time of your dg/dl samples. More information on thissort of analysis can be found in reference [1] below.
Once you have the uncertainty in each estimate of <dg/dl>, you stillhave to combine these to get the uncertainty estimate for theintegrated free energy difference using standard propagation oferror. This depends on your choice of quadrature for TI. DavidMobley has done a lot of this, and I'm sure would be willing to helpif you had trouble figuring it out.
Good luck!

- John
[1] W. Janke. Statistical analysis of simulations: Data correlationsand error estimation. In J. Grotendorst, D. Marx, and A. Murmatsu,editors, Quantum Simulations of Complex Many-Body Systems: FromTheory to Algorithms, volume 10, pages 423?445. John von NeumannInstitute for Computing, 2002.
--
John Chodera <[EMAIL PROTECTED]>             | Mobile    : 415 867-7384
Postdoctoral researcher, Pande lab            | Lab phone : 650.723.1097
Department of Chemistry, Stanford University  | Lab fax   : 650.724.4021
http://www.dillgroup.ucsf.edu/~jchodera
--

Date: Mon, 07 May 2007 16:41:26 +0200
From: Patrick Fuchs <[EMAIL PROTECTED]>
Subject: [gmx-users] Very large fluctuations in dg/dl
To: gmx-users@gromacs.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Gromacs users,
I have a few questions related to solvation free energy calculation via
thermodynamic integration.
I'm trying to reproduce some literature data (on e.g. methane,
methanol...) using the GROMOS G53a6 force field. I followed the tutorial
of David Mobley (thanks to him BTW), but I used the standard non bonded
options of the G53a6 force field (instead of OPLS). For each lambda
value I do a minimization, a 10 ps NVT followed by a 20 ps NPT
equilibration, and a 1 ns NVT production using the sd integrator. I used
21 lambda values (0.00, 0.05...1.00).
Here's my topology file:
----------------begining of methane.top------------------------
; topology for a methane molecule

; include GROMOS53a6 force field
#include "ffG53a6.itp"

;;;;;;; begin methane definition ;;;;;;;
[ moleculetype ]
; Name           nrexcl
METH             3

[ atoms ]
;nr type resnr residue atom cgnr charge mass    typeB chargeB massB
1  CH4  1     METH    C1   0    0.0000 16.0430 DUM   0.0000  16.04300
;;;;;; end methane definition ;;;;;;;;

; include water topology
#ifdef FLEX_SPC
#include "flexspc.itp"
#else
#include "spc.itp"
#endif

[ system ]
; name
1 methane molecule in water

[ molecules ]
; name  number
METH    1
SOL     893
-----------------end of methane.top------------------------

And here is my mdp file for lambda=0:
---------------begining of prod.mdp---------------------
title               = production NVT methane/water
cpp                 = /lib/cpp
; OPTIONS FOR BOND CONSTRAINTS
constraints         = all-bonds
; RUN CONTROL PARAMETERS
integrator          = sd
tinit               = 0
dt                  = 0.002
nsteps              = 500000 ; 1000 ps
; NUMBER OF STEPS FOR CENTER OF MASS MOTION REMOVAL
nstcomm             = 100
; OUTPUT CONTROL OPTIONS
nstxout                  = 500
nstvout                  = 500
nstfout                  = 0
nstlog                   = 500
nstenergy                = 100
nstxtcout                = 5000
xtc-precision            = 1000
; NON BONDED STUFF
ns_type             =  grid
nstlist                = 5
rlist                  = 0.8
coulombtype            = generalized-reaction-field
rcoulomb               = 1.4
rvdw                   = 1.4
epsilon_rf             = 54.0
;OPTIONS FOR TEMPERATURE COUPLING
tc_grps                  = system
tau_t                    = 0.1 ; inverse langevin friction cst
ref_t                    = 300
;OPTIONS FOR PRESSURE COUPLING
Pcoupl                   = no
tau_p                    = 0.5
compressibility          = 4.5e-5
ref_p                    = 1.0
; FREE ENERGY CONTROL STUFF
free_energy              = yes
init_lambda              = 0.00
delta_lambda             = 0
sc_alpha                 = 0.5
sc-power                 = 1.0
sc-sigma                 = 0.3
; VELOCITY GENERATION
gen_vel                  = yes
gen_temp                 = 300
gen_seed                 = -1
-----------------end of prod.mdp------------------------
I find a reasonable DeltaGsol value of 8.6 kJ/mol for methane (compared
to 8.7 in Geerke & van Gunsteren, ChemPhysChem 2006, 7, 671 ? 678) but
I get really huge fluctuations in the values of dg/dl:
lambda=0.00: 5.0 +/- 10.8 (mean +/- standard deviation)
lambda=0.05: 4.3 +/- 11.2
...
lambda=1.00: -0.3 +/- 4.0
Furthermore, each of these mean value is very slow at converging (1 ns
seems a minimum for certain lambda values...).
I can't get reasonable fluctuations even if I sample more. In addition,
there are very frequent warnings in the log file such as:
----
Large VCM(group rest):      0.01363,      0.00818,      0.01147,
ekin-cm:  3.09490e+00
----
Here are my questions:
1) Has someone an idea of what could be the cause of these [very] large
fluctuations? Does it come from my setup, or is this a normal behavior?
2) Are these 'Large VCM(group rest)' warnings related to the use of sd
integrator (when I switch to md integrator, I no longer get these
warnings) ?
Thanks for your answer,

Patrick


--
_______________________________________________________
Patrick FUCHS
Equipe de Bioinformatique Genomique et Moleculaire
INSERM U726, Universite Paris 7
Case Courrier 7113
2, place Jussieu, 75251 Paris Cedex 05, FRANCE
Tel : +33 (0)1-44-27-77-16 - Fax : +33 (0)1-43-26-38-30
E-mail : [EMAIL PROTECTED]
Web Site: http://www.ebgm.jussieu.fr/~fuchs
_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Re: [gmx-users] Very large fluctuations in dg/dl

Reply via email to