Re: [gmx-users] gpu cluster explanation

Francesco Tue, 23 Jul 2013 03:03:52 -0700

Hi Richard,
Thank you for the help and sorry for the delay in my reply.
I tried some test run changing some parameters (e.g. removing PME) and I
was able to reach 20ns/day, so I think that 9-11 ns/day it's the max
that I can obtain for my setting.


thank your again for your help.

cheers,

Fra

On Fri, 12 Jul 2013, at 03:41 PM, Richard Broadbent wrote:
> 
> 
> On 12/07/13 13:26, Francesco wrote:
> > Hi all,
> > I'm working with a 200K atoms system (protein + explicit water) and
> > after a while using a cpu cluster I had to switch to a gpu cluster.
> > I read both Acceleration and parallelization and Gromacs-gpu
> > documentation pages
> > (http://www.gromacs.org/Documentation/Acceleration_and_parallelization
> > and
> > http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM)
> > but it's a bit confusing and I need help to understand if I really have
> > understood correctly. :)
> > I have 2 type of nodes:
> > 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @
> > 2.53GHz)
> > 8gpu and 2 cpu (6 cores each)
> >
> > 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3
> > MPI max.
> > 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because
> > 4x3= 12
> >
> > now if I have a node with 8 gpu, I can use 4 gpu:
> > 4 MPI and 3 OpenMP
> > is it right?
> > is it possible to use 8 gpu and 8 cores only?
> 
> you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. 
> However, a system that unbalanced (huge amount of gpu power to 
> comparatively little cpu power) is unlikely to get great performance.
> >
> > Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3
> > gpu  and 12 cores I get 9-11 ns/day.
> >
> That slowdown is in line with what I got when I tried a similar cpu-gpu 
> setup. That said other's might have some advice that will improve your 
> performance.
> 
> > the command that I use is:
> > mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v
> > with n° gpu set via script :
> > #BSUB -n 3
> >
> > I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes.
> >
> > The mdp file and some statistics are following:
> >
> > -------- START MDP --------
> >
> > title             = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD
> >
> > ; Run parameters
> > integrator              = md                    ; Algorithm options
> > nsteps                  = 25000000      ; maximum number of steps to
> > perform [50 ns]
> > dt                      = 0.002         ; 2 fs = 0.002 ps
> >
> > ; Output control
> > nstxout            = 10000     ; [steps] freq to write coordinates to
> > trajectory, the last coordinates are always written
> > nstvout            = 10000     ; [steps] freq to write velocities to
> > trajectory, the last velocities are always written
> > nstlog              = 10000     ; [steps] freq to write energies to log
> > file, the last energies are always written
> > nstenergy         = 10000          ; [steps] write energies to disk
> > every nstenergy steps
> > nstxtcout          = 10000     ; [steps] freq to write coordinates to
> > xtc trajectory
> > xtc_precision   = 1000          ; precision to write to xtc trajectory
> > (1000 = default)
> > xtc_grps                = system                ; which coordinate
> > group(s) to write to disk
> > energygrps      = system                ; or System / which energy
> > group(s) to writk
> >
> > ; Bond parameters
> > continuation    = yes                   ; restarting from npt
> > constraints     = all-bonds     ; Bond types to replace by constraints
> > constraint_algorithm    = lincs         ; holonomic constraints
> > lincs_iter              = 1                     ; accuracy of LINCS
> > lincs_order             = 4                     ; also related to
> > accuracy
> > lincs_warnangle  = 30        ; [degrees] maximum angle that a bond can
> > rotate before LINCS will complain
> >
> 
> That seems a little loose for constraints but setting that up and 
> checking it's conserving energy and preserving bond lengths is something 
> you'll have to do yourself
> 
> Richard
> > ; Neighborsearching
> > ns_type                 = grid      ; method of updating neighbor list
> > cutoff-scheme     = Verlet
> > nstlist                 = 10        ; [steps] frequence to update
> > neighbor list (10)
> > rlist                 = 1.0       ; [nm] cut-off distance for the
> > short-range neighbor list  (1 default)
> > rcoulomb          = 1.0       ; [nm] long range electrostatic cut-off
> > rvdw              = 1.0       ; [nm]  long range Van der Waals cut-off
> >
> > ; Electrostatics
> > coulombtype    = PME          ; treatment of long range electrostatic
> > interactions
> > vdwtype         = cut-off       ; treatment of Van der Waals
> > interactions
> >
> > ; Periodic boundary conditions
> > pbc                     = xyz
> >
> > ; Dispersion correction
> > DispCorr                                = EnerPres      ; appling long
> > range dispersion corrections
> >
> > ; Ewald
> > fourierspacing    = 0.12                ; grid spacing for FFT  -
> > controll the higest magnitude of wave vectors (0.12)
> > pme_order         = 4         ; interpolation order for PME, 4 = cubic
> > ewald_rtol        = 1e-5      ; relative strength of Ewald-shifted
> > potential at rcoulomb
> >
> > ; Temperature coupling
> > tcoupl          = nose-hoover                           ; temperature
> > coupling with Nose-Hoover ensemble
> > tc_grps         = Protein Non-Protein
> > tau_t                   = 0.4        0.4                        ; [ps]
> > time constant
> > ref_t                   = 310        310                        ; [K]
> > reference temperature for coupling [310 = 28°C
> >
> > ; Pressure coupling
> > pcoupl          = parrinello-rahman
> > pcoupltype        = isotropic                                   ;
> > uniform scaling of box vect
> > tau_p           = 2.0
> > ; [ps] time constant
> > ref_p                   = 1.0
> >         ; [bar] reference pressure for coupling
> > compressibility = 4.5e-5
> > ; [bar^-1] isothermal compressibility of water
> > refcoord_scaling        = com
> >         ; have a look at GROMACS documentation 7.
> >
> > ; Velocity generation
> > gen_vel         = no                     ; generate velocities in grompp
> > according to a Maxwell distribution
> >
> > -------- END MDP --------
> >
> > -------- START STATISTICS  --------
> >
> >         P P   -   P M E   L O A D   B A L A N C I N G
> >
> >   PP/PME load balancing changed the cut-off and PME settings:
> >             particle-particle                    PME
> >              rcoulomb  rlist            grid      spacing   1/beta
> >     initial  1.000 nm  1.155 nm     100 128  96   0.120 nm  0.320 nm
> >     final    1.201 nm  1.356 nm      96 100  80   0.144 nm  0.385 nm
> >   cost-ratio           1.62             0.62
> >   (note that these numbers concern only part of the total PP and PME
> >   load)
> >
> >     M E G A - F L O P S   A C C O U N T I N G
> >
> >   NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
> >   RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
> >   W3=SPC/TIP3p  W4=TIP4p (single or pairs)
> >   V&F=Potential and force  V=Potential only  F=Force only
> >
> >      D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
> >
> >   av. #atoms communicated per step for force:  2 x 54749.0
> >   av. #atoms communicated per step for LINCS:  2 x 5418.4
> >
> >   Average load imbalance: 12.8 %
> >   Part of the total run time spent waiting due to load imbalance: 1.4 %
> >   Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
> >   Y 0 %
> >
> >
> >       R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> >
> >   Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles
> >   %
> > -----------------------------------------------------------------------------
> >   Domain decomp.         3    4     625000   10388.307   315806.805
> >   2.3
> >   DD comm. load          3    4     625000     129.908     3949.232
> >   0.0
> >   DD comm. bounds        3    4     625001     267.204     8123.069
> >   0.1
> >   Neighbor search        3    4     625001    7756.651   235803.900
> >   1.7
> >   Launch GPU ops.        3    4   50000002    3376.764   102654.354
> >   0.8
> >   Comm. coord.           3    4   24375000   10651.213   323799.209
> >   2.4
> >   Force                  3    4   25000001   35732.146  1086265.102
> >   8.0
> >   Wait + Comm. F         3    4   25000001   19866.403   603943.033
> >   4.5
> >   PME mesh               3    4   25000001  235964.754  7173380.387
> >   53.0
> >   Wait GPU nonlocal      3    4   25000001   12055.970   366504.140
> >   2.7
> >   Wait GPU local         3    4   25000001     106.179     3227.866
> >   0.0
> >   NB X/F buffer ops.     3    4   98750002   10256.750   311807.459
> >   2.3
> >   Write traj.            3    4       2994     249.770     7593.073
> >   0.1
> >   Update                 3    4   25000001   33108.852  1006516.379
> >   7.4
> >   Constraints            3    4   25000001   51671.482  1570824.423
> >   11.6
> >   Comm. energies         3    4    2500001     463.135    14079.404
> >   0.1
> >   Rest                   3                   13290.037   404020.040
> >   3.0
> > -----------------------------------------------------------------------------
> >   Total                  3                  445335.526 13538297.876
> >   100.0
> > -----------------------------------------------------------------------------
> > -----------------------------------------------------------------------------
> >   PME redist. X/F        3    4   50000002   40747.165  1238722.760
> >   9.1
> >   PME spread/gather      3    4   50000002  122026.128  3709621.109
> >   27.4
> >   PME 3D-FFT             3    4   50000002   46613.023  1417046.140
> >   10.5
> >   PME 3D-FFT Comm.       3    4   50000002   20934.134   636402.285
> >   4.7
> >   PME solve              3    4   25000001    5465.690   166158.163
> >   1.2
> > -----------------------------------------------------------------------------
> >
> >                 Core t (s)   Wall t (s)        (%)
> >         Time:  5317976.200   445335.526     1194.2
> >                           5d03h42:15
> >                   (ns/day)    (hour/ns)
> > Performance:        9.701        2.474
> >
> > -------- END STATISTICS  --------
> >
> > thank you very much for the help.
> > cheers,
> > Fra
> >
> -- 
> gmx-users mailing list    gmx-users@gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-requ...@gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


-- 
Francesco Carbone
PhD student
Institute of Structural and Molecular Biology
UCL, London
fra.carbone...@ucl.ac.uk
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Re: [gmx-users] gpu cluster explanation

Reply via email to