David van der Spoel wrote:
Dimitris Dellis wrote:
Justin A. Lemkul wrote:
Dimitris Dellis wrote:
Hi.
I run the same (exactly) simulations with v3.3.3 and v4.0.3, on
the same 64bit Q6600/DDR2-1066 machine, gcc-4.3.2 ,fftw-3.2.
I found that the performance of 4.0.3 is roughly 30% lower than
3.3.3 (30% higher hours/ns), for few systems (512 molecules of 5-15
sites, nstlist=10) I tried.
This happens with single precision serial and parallel, np=2,4
(openmpi 1.3) versions and only when electrostatics (PME) are
present.
With Simple LJ potentials the performance is exactly the same.
Is there any speed comparison 3.3.3 vs 4.0.3 available ?
D.D.
Can you show us your .mdp file? What did grompp report about the
relative PME load? These topics have been discussed a few times;
you'll find lots of pointers on optimizing performance in the list
archive.
try turning off optimize_fft
Though better with optimize_fft = no,
performce of 3.3.3 is still better.
v4.0.3 1.389 hr/ns (from 1.665)
-Justin
Hi Justin,
These are from the small system, no I/O only 1k steps.
grompp.mdp
===========
integrator = md
dt = 0.0010
nsteps = 1000
nstxout = 0
nstvout = 0
nstlog = 1000
nstcomm = 10
nstenergy = 0
nstxtcout = 0
nstlist = 10
ns_type = grid
dispcorr = AllEnerPres
tcoupl = berendsen
tc-grps = System
ref_t = 293.15
gen_temp = 293.15
tau_t = 0.2
gen_vel = no
gen_seed = 123456
constraints = none
constraint_algorithm = shake
dispcorr = AllEnerPres
energygrps = System rlist = 1.6
vdw-type = Cut-off
rvdw = 1.6
coulombtype = PME
fourierspacing = 0.12
pme_order = 4
ewald_rtol = 1.0e-5
optimize_fft = yes
rcoulomb = 1.6
related 4.0.3 grompp output
Estimate for the relative computational load of the PME mesh part: 0.19
4.0.3 mdrun serial timings (near zero omitted)
Coul(T) + LJ 576.513824 31708.260 71.5
Outer nonbonded loop 8.489390 84.894 0.2
Calc Weights 6.006000 216.216 0.5
Spread Q Bspline 128.128000 256.256 0.6
Gather F Bspline 128.128000 1537.536 3.5
3D-FFT 1088.769682 8710.157 19.6
Solve PME 18.531513 1186.017 2.7
parallel 4.0.3 np=4
Average load imbalance: 5.2 %
Part of the total run time spent waiting due to load imbalance: 2.3 %
Performance: 96.086 7.380 14.414 1.665
3.3.3 mdrun serial timings
Coul(T) + LJ 576.529632 31709.129760 72.0
Outer nonbonded loop 8.487860 84.878600 0.2
Spread Q Bspline 128.128000 256.256000 0.6
Gather F Bspline 128.128000 1537.536000 3.5
3D-FFT 1088.769682 8710.157456 19.8
Solve PME 17.986469 1151.133984 2.6
parallel 3.3.3 np=4
Performance: 144.132 12.556 21.600 1.111
D.D.
------------------------------------------------------------------------
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before
posting!
Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
------------------------------------------------------------------------
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before
posting!
Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php