Hi gromacs users,

I have installed the lastest version of gromacs (4.5.1) in an i7 980X
(6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
mpi version. Also I compiled the GPU-accelerated
version of gromacs. Then I did a  2 ns simulation using a small system
(11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
The results that I got are bellow:

############################################
My *.mdp is:

constraints         =  all-bonds
integrator          =  md
dt                  =  0.002    ; ps !
nsteps              =  1000000  ; total 2000 ps.
nstlist             =  10
ns_type             =  grid
coulombtype    = PME
rvdw                = 0.9
rlist               = 0.9
rcoulomb            = 0.9
fourierspacing      = 0.10
pme_order           = 4
ewald_rtol          = 1e-5
vdwtype             =  cut-off
pbc                 =  xyz
epsilon_rf    =  0
comm_mode           =  linear
nstxout             =  1000
nstvout             =  0
nstfout             =  0
nstxtcout           =  1000
nstlog              =  1000
nstenergy           =  1000
; Berendsen temperature coupling is on in four groups
tcoupl              = berendsen
tc-grps             = system
tau-t               = 0.1
ref-t               = 298
; Pressure coupling is on
Pcoupl = berendsen
pcoupltype = isotropic
tau_p = 0.5
compressibility = 4.5e-5
ref_p = 1.0
; Generate velocites is on at 298 K.
gen_vel = no

########################
RUNNING GROMACS ON GPU

mdrun-gpu -s topol.tpr -v > & out &

Here is a part of the md.log:

Started mdrun on node 0 Wed Oct 20 09:52:09 2010
.
.
.
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:     Nodes   Number          G-Cycles        Seconds     %
------------------------------------------------------------------------------------------------------
 Write traj.    1               1021                    106.075 31.7            
0.2
 Rest                   1               64125.577               19178.6 99.8
------------------------------------------------------------------------------------------------------
 Total          1               64231.652               19210.3 100.0
------------------------------------------------------------------------------------------------------

                        NODE (s)                Real (s)                (%)
       Time:    6381.840                19210.349               33.2
                       1h46:21
                        (Mnbf/s)   (MFlops)     (ns/day)        (hour/ns)
Performance:    0.000   0.001   27.077  0.886

Finished mdrun on node 0 Wed Oct 20 15:12:19 2010

########################
RUNNING GROMACS ON MPI

mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &

Here is a part of the md.log:

Started mdrun on node 0 Wed Oct 20 18:30:52 2010

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:             Nodes   Number  G-Cycles    Seconds             %
--------------------------------------------------------------------------------------------------------------
 Domain decomp. 3              100001     1452.166      434.7             0.6
 DD comm. load          3              10001        0.745          0.2
       0.0
 Send X to PME         3              1000001    249.003       74.5
          0.1
 Comm. coord.           3              1000001   637.329        190.8
          0.3
 Neighbor search        3              100001     8738.669      2616.0
         3.5
 Force                       3              1000001   99210.202
29699.2        39.2
 Wait + Comm. F       3              1000001   3361.591       1006.3         1.3
 PME mesh               3              1000001   66189.554     19814.2
       26.2
 Wait + Comm. X/F    3              60294.513 8049.5          23.8
 Wait + Recv. PME F 3              1000001    801.897        240.1           0.3
 Write traj.                 3              1015         33.464
  10.0             0.0
 Update                     3              1000001    3295.820
986.6          1.3
 Constraints              3              1000001     6317.568
1891.2          2.5
 Comm. energies       3              100002      70.784          21.2
           0.0
 Rest                        3                              2314.844
    693.0           0.9
--------------------------------------------------------------------------------------------------------------
 Total                        6              252968.148    75727.5
                 100.0
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
 PME redist. X/F        3              2000002    1945.551      582.4
          0.8
 PME spread/gather   3              2000002    37219.607    11141.9        14.7
 PME 3D-FFT            3              2000002    21453.362     6422.2
        8.5
 PME solve               3              1000001     5551.056
1661.7           2.2
--------------------------------------------------------------------------------------------------------------

Parallel run - timing based on wallclock.

                        NODE (s)         Real (s)                    (%)
       Time:    12621.257       12621.257           100.0
                       3h30:21
                        (Mnbf/s)           (GFlops)                (ns/day)     
         (hour/ns)
Performance:    388.633            28.773          13.691         1.753
Finished mdrun on node 0 Wed Oct 20 22:01:14 2010

######################################
Comparing the performance values for the two simulations I saw that in
"numeric terms" the simulation using the GPU gave (for example) ~27
ns/day, while when I used  mpi this value is aproximatelly half (13.7
ns/day).
However, when I compared the time that each simulation
started/finished, the simulation using mpi tooks 211 minutes while the
gpu simulation tooked 320 minutes to finish.

My questions are:

1. Why in the performace values I got better results with the GPU?

2. Why the simulation running on GPU was 109 min. slower than on 6
cores, since my video card is a GTX 480 with 480 gpu cores? I was
expecting that the GPU would accelerate greatly the simulations.


Does anyone have some idea?

Thanks,

Renato
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to