Hi Richard, Thank you for the help and sorry for the delay in my reply. I tried some test run changing some parameters (e.g. removing PME) and I was able to reach 20ns/day, so I think that 9-11 ns/day it's the max that I can obtain for my setting.
thank your again for your help. cheers, Fra On Fri, 12 Jul 2013, at 03:41 PM, Richard Broadbent wrote: > > > On 12/07/13 13:26, Francesco wrote: > > Hi all, > > I'm working with a 200K atoms system (protein + explicit water) and > > after a while using a cpu cluster I had to switch to a gpu cluster. > > I read both Acceleration and parallelization and Gromacs-gpu > > documentation pages > > (http://www.gromacs.org/Documentation/Acceleration_and_parallelization > > and > > http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM) > > but it's a bit confusing and I need help to understand if I really have > > understood correctly. :) > > I have 2 type of nodes: > > 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @ > > 2.53GHz) > > 8gpu and 2 cpu (6 cores each) > > > > 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3 > > MPI max. > > 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because > > 4x3= 12 > > > > now if I have a node with 8 gpu, I can use 4 gpu: > > 4 MPI and 3 OpenMP > > is it right? > > is it possible to use 8 gpu and 8 cores only? > > you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. > However, a system that unbalanced (huge amount of gpu power to > comparatively little cpu power) is unlikely to get great performance. > > > > Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3 > > gpu and 12 cores I get 9-11 ns/day. > > > That slowdown is in line with what I got when I tried a similar cpu-gpu > setup. That said other's might have some advice that will improve your > performance. > > > the command that I use is: > > mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v > > with n° gpu set via script : > > #BSUB -n 3 > > > > I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes. > > > > The mdp file and some statistics are following: > > > > -------- START MDP -------- > > > > title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD > > > > ; Run parameters > > integrator = md ; Algorithm options > > nsteps = 25000000 ; maximum number of steps to > > perform [50 ns] > > dt = 0.002 ; 2 fs = 0.002 ps > > > > ; Output control > > nstxout = 10000 ; [steps] freq to write coordinates to > > trajectory, the last coordinates are always written > > nstvout = 10000 ; [steps] freq to write velocities to > > trajectory, the last velocities are always written > > nstlog = 10000 ; [steps] freq to write energies to log > > file, the last energies are always written > > nstenergy = 10000 ; [steps] write energies to disk > > every nstenergy steps > > nstxtcout = 10000 ; [steps] freq to write coordinates to > > xtc trajectory > > xtc_precision = 1000 ; precision to write to xtc trajectory > > (1000 = default) > > xtc_grps = system ; which coordinate > > group(s) to write to disk > > energygrps = system ; or System / which energy > > group(s) to writk > > > > ; Bond parameters > > continuation = yes ; restarting from npt > > constraints = all-bonds ; Bond types to replace by constraints > > constraint_algorithm = lincs ; holonomic constraints > > lincs_iter = 1 ; accuracy of LINCS > > lincs_order = 4 ; also related to > > accuracy > > lincs_warnangle = 30 ; [degrees] maximum angle that a bond can > > rotate before LINCS will complain > > > > That seems a little loose for constraints but setting that up and > checking it's conserving energy and preserving bond lengths is something > you'll have to do yourself > > Richard > > ; Neighborsearching > > ns_type = grid ; method of updating neighbor list > > cutoff-scheme = Verlet > > nstlist = 10 ; [steps] frequence to update > > neighbor list (10) > > rlist = 1.0 ; [nm] cut-off distance for the > > short-range neighbor list (1 default) > > rcoulomb = 1.0 ; [nm] long range electrostatic cut-off > > rvdw = 1.0 ; [nm] long range Van der Waals cut-off > > > > ; Electrostatics > > coulombtype = PME ; treatment of long range electrostatic > > interactions > > vdwtype = cut-off ; treatment of Van der Waals > > interactions > > > > ; Periodic boundary conditions > > pbc = xyz > > > > ; Dispersion correction > > DispCorr = EnerPres ; appling long > > range dispersion corrections > > > > ; Ewald > > fourierspacing = 0.12 ; grid spacing for FFT - > > controll the higest magnitude of wave vectors (0.12) > > pme_order = 4 ; interpolation order for PME, 4 = cubic > > ewald_rtol = 1e-5 ; relative strength of Ewald-shifted > > potential at rcoulomb > > > > ; Temperature coupling > > tcoupl = nose-hoover ; temperature > > coupling with Nose-Hoover ensemble > > tc_grps = Protein Non-Protein > > tau_t = 0.4 0.4 ; [ps] > > time constant > > ref_t = 310 310 ; [K] > > reference temperature for coupling [310 = 28°C > > > > ; Pressure coupling > > pcoupl = parrinello-rahman > > pcoupltype = isotropic ; > > uniform scaling of box vect > > tau_p = 2.0 > > ; [ps] time constant > > ref_p = 1.0 > > ; [bar] reference pressure for coupling > > compressibility = 4.5e-5 > > ; [bar^-1] isothermal compressibility of water > > refcoord_scaling = com > > ; have a look at GROMACS documentation 7. > > > > ; Velocity generation > > gen_vel = no ; generate velocities in grompp > > according to a Maxwell distribution > > > > -------- END MDP -------- > > > > -------- START STATISTICS -------- > > > > P P - P M E L O A D B A L A N C I N G > > > > PP/PME load balancing changed the cut-off and PME settings: > > particle-particle PME > > rcoulomb rlist grid spacing 1/beta > > initial 1.000 nm 1.155 nm 100 128 96 0.120 nm 0.320 nm > > final 1.201 nm 1.356 nm 96 100 80 0.144 nm 0.385 nm > > cost-ratio 1.62 0.62 > > (note that these numbers concern only part of the total PP and PME > > load) > > > > M E G A - F L O P S A C C O U N T I N G > > > > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels > > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table > > W3=SPC/TIP3p W4=TIP4p (single or pairs) > > V&F=Potential and force V=Potential only F=Force only > > > > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S > > > > av. #atoms communicated per step for force: 2 x 54749.0 > > av. #atoms communicated per step for LINCS: 2 x 5418.4 > > > > Average load imbalance: 12.8 % > > Part of the total run time spent waiting due to load imbalance: 1.4 % > > Steps where the load balancing was limited by -rdd, -rcon and/or -dds: > > Y 0 % > > > > > > R E A L C Y C L E A N D T I M E A C C O U N T I N G > > > > Computing: Nodes Th. Count Wall t (s) G-Cycles > > % > > ----------------------------------------------------------------------------- > > Domain decomp. 3 4 625000 10388.307 315806.805 > > 2.3 > > DD comm. load 3 4 625000 129.908 3949.232 > > 0.0 > > DD comm. bounds 3 4 625001 267.204 8123.069 > > 0.1 > > Neighbor search 3 4 625001 7756.651 235803.900 > > 1.7 > > Launch GPU ops. 3 4 50000002 3376.764 102654.354 > > 0.8 > > Comm. coord. 3 4 24375000 10651.213 323799.209 > > 2.4 > > Force 3 4 25000001 35732.146 1086265.102 > > 8.0 > > Wait + Comm. F 3 4 25000001 19866.403 603943.033 > > 4.5 > > PME mesh 3 4 25000001 235964.754 7173380.387 > > 53.0 > > Wait GPU nonlocal 3 4 25000001 12055.970 366504.140 > > 2.7 > > Wait GPU local 3 4 25000001 106.179 3227.866 > > 0.0 > > NB X/F buffer ops. 3 4 98750002 10256.750 311807.459 > > 2.3 > > Write traj. 3 4 2994 249.770 7593.073 > > 0.1 > > Update 3 4 25000001 33108.852 1006516.379 > > 7.4 > > Constraints 3 4 25000001 51671.482 1570824.423 > > 11.6 > > Comm. energies 3 4 2500001 463.135 14079.404 > > 0.1 > > Rest 3 13290.037 404020.040 > > 3.0 > > ----------------------------------------------------------------------------- > > Total 3 445335.526 13538297.876 > > 100.0 > > ----------------------------------------------------------------------------- > > ----------------------------------------------------------------------------- > > PME redist. X/F 3 4 50000002 40747.165 1238722.760 > > 9.1 > > PME spread/gather 3 4 50000002 122026.128 3709621.109 > > 27.4 > > PME 3D-FFT 3 4 50000002 46613.023 1417046.140 > > 10.5 > > PME 3D-FFT Comm. 3 4 50000002 20934.134 636402.285 > > 4.7 > > PME solve 3 4 25000001 5465.690 166158.163 > > 1.2 > > ----------------------------------------------------------------------------- > > > > Core t (s) Wall t (s) (%) > > Time: 5317976.200 445335.526 1194.2 > > 5d03h42:15 > > (ns/day) (hour/ns) > > Performance: 9.701 2.474 > > > > -------- END STATISTICS -------- > > > > thank you very much for the help. > > cheers, > > Fra > > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- Francesco Carbone PhD student Institute of Structural and Molecular Biology UCL, London fra.carbone...@ucl.ac.uk -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists