Hi gromacs users, I have installed the lastest version of gromacs (4.5.1) in an i7 980X (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its mpi version. Also I compiled the GPU-accelerated version of gromacs. Then I did a 2 ns simulation using a small system (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. The results that I got are bellow:
############################################ My *.mdp is: constraints = all-bonds integrator = md dt = 0.002 ; ps ! nsteps = 1000000 ; total 2000 ps. nstlist = 10 ns_type = grid coulombtype = PME rvdw = 0.9 rlist = 0.9 rcoulomb = 0.9 fourierspacing = 0.10 pme_order = 4 ewald_rtol = 1e-5 vdwtype = cut-off pbc = xyz epsilon_rf = 0 comm_mode = linear nstxout = 1000 nstvout = 0 nstfout = 0 nstxtcout = 1000 nstlog = 1000 nstenergy = 1000 ; Berendsen temperature coupling is on in four groups tcoupl = berendsen tc-grps = system tau-t = 0.1 ref-t = 298 ; Pressure coupling is on Pcoupl = berendsen pcoupltype = isotropic tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0 ; Generate velocites is on at 298 K. gen_vel = no ######################## RUNNING GROMACS ON GPU mdrun-gpu -s topol.tpr -v > & out & Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 . . . R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % ------------------------------------------------------------------------------------------------------ Write traj. 1 1021 106.075 31.7 0.2 Rest 1 64125.577 19178.6 99.8 ------------------------------------------------------------------------------------------------------ Total 1 64231.652 19210.3 100.0 ------------------------------------------------------------------------------------------------------ NODE (s) Real (s) (%) Time: 6381.840 19210.349 33.2 1h46:21 (Mnbf/s) (MFlops) (ns/day) (hour/ns) Performance: 0.000 0.001 27.077 0.886 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 ######################## RUNNING GROMACS ON MPI mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out & Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 18:30:52 2010 R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -------------------------------------------------------------------------------------------------------------- Domain decomp. 3 100001 1452.166 434.7 0.6 DD comm. load 3 10001 0.745 0.2 0.0 Send X to PME 3 1000001 249.003 74.5 0.1 Comm. coord. 3 1000001 637.329 190.8 0.3 Neighbor search 3 100001 8738.669 2616.0 3.5 Force 3 1000001 99210.202 29699.2 39.2 Wait + Comm. F 3 1000001 3361.591 1006.3 1.3 PME mesh 3 1000001 66189.554 19814.2 26.2 Wait + Comm. X/F 3 60294.513 8049.5 23.8 Wait + Recv. PME F 3 1000001 801.897 240.1 0.3 Write traj. 3 1015 33.464 10.0 0.0 Update 3 1000001 3295.820 986.6 1.3 Constraints 3 1000001 6317.568 1891.2 2.5 Comm. energies 3 100002 70.784 21.2 0.0 Rest 3 2314.844 693.0 0.9 -------------------------------------------------------------------------------------------------------------- Total 6 252968.148 75727.5 100.0 -------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------- PME redist. X/F 3 2000002 1945.551 582.4 0.8 PME spread/gather 3 2000002 37219.607 11141.9 14.7 PME 3D-FFT 3 2000002 21453.362 6422.2 8.5 PME solve 3 1000001 5551.056 1661.7 2.2 -------------------------------------------------------------------------------------------------------------- Parallel run - timing based on wallclock. NODE (s) Real (s) (%) Time: 12621.257 12621.257 100.0 3h30:21 (Mnbf/s) (GFlops) (ns/day) (hour/ns) Performance: 388.633 28.773 13.691 1.753 Finished mdrun on node 0 Wed Oct 20 22:01:14 2010 ###################################### Comparing the performance values for the two simulations I saw that in "numeric terms" the simulation using the GPU gave (for example) ~27 ns/day, while when I used mpi this value is aproximatelly half (13.7 ns/day). However, when I compared the time that each simulation started/finished, the simulation using mpi tooks 211 minutes while the gpu simulation tooked 320 minutes to finish. My questions are: 1. Why in the performace values I got better results with the GPU? 2. Why the simulation running on GPU was 109 min. slower than on 6 cores, since my video card is a GTX 480 with 480 gpu cores? I was expecting that the GPU would accelerate greatly the simulations. Does anyone have some idea? Thanks, Renato -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists