Hi all, I'm working with a 200K atoms system (protein + explicit water) and after a while using a cpu cluster I had to switch to a gpu cluster. I read both Acceleration and parallelization and Gromacs-gpu documentation pages (http://www.gromacs.org/Documentation/Acceleration_and_parallelization and http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM) but it's a bit confusing and I need help to understand if I really have understood correctly. :) I have 2 type of nodes: 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @ 2.53GHz) 8gpu and 2 cpu (6 cores each)
1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3 MPI max. 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because 4x3= 12 now if I have a node with 8 gpu, I can use 4 gpu: 4 MPI and 3 OpenMP is it right? is it possible to use 8 gpu and 8 cores only? Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3 gpu and 12 cores I get 9-11 ns/day. the command that I use is: mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v with n° gpu set via script : #BSUB -n 3 I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes. The mdp file and some statistics are following: -------- START MDP -------- title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD ; Run parameters integrator = md ; Algorithm options nsteps = 25000000 ; maximum number of steps to perform [50 ns] dt = 0.002 ; 2 fs = 0.002 ps ; Output control nstxout = 10000 ; [steps] freq to write coordinates to trajectory, the last coordinates are always written nstvout = 10000 ; [steps] freq to write velocities to trajectory, the last velocities are always written nstlog = 10000 ; [steps] freq to write energies to log file, the last energies are always written nstenergy = 10000 ; [steps] write energies to disk every nstenergy steps nstxtcout = 10000 ; [steps] freq to write coordinates to xtc trajectory xtc_precision = 1000 ; precision to write to xtc trajectory (1000 = default) xtc_grps = system ; which coordinate group(s) to write to disk energygrps = system ; or System / which energy group(s) to writk ; Bond parameters continuation = yes ; restarting from npt constraints = all-bonds ; Bond types to replace by constraints constraint_algorithm = lincs ; holonomic constraints lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy lincs_warnangle = 30 ; [degrees] maximum angle that a bond can rotate before LINCS will complain ; Neighborsearching ns_type = grid ; method of updating neighbor list cutoff-scheme = Verlet nstlist = 10 ; [steps] frequence to update neighbor list (10) rlist = 1.0 ; [nm] cut-off distance for the short-range neighbor list (1 default) rcoulomb = 1.0 ; [nm] long range electrostatic cut-off rvdw = 1.0 ; [nm] long range Van der Waals cut-off ; Electrostatics coulombtype = PME ; treatment of long range electrostatic interactions vdwtype = cut-off ; treatment of Van der Waals interactions ; Periodic boundary conditions pbc = xyz ; Dispersion correction DispCorr = EnerPres ; appling long range dispersion corrections ; Ewald fourierspacing = 0.12 ; grid spacing for FFT - controll the higest magnitude of wave vectors (0.12) pme_order = 4 ; interpolation order for PME, 4 = cubic ewald_rtol = 1e-5 ; relative strength of Ewald-shifted potential at rcoulomb ; Temperature coupling tcoupl = nose-hoover ; temperature coupling with Nose-Hoover ensemble tc_grps = Protein Non-Protein tau_t = 0.4 0.4 ; [ps] time constant ref_t = 310 310 ; [K] reference temperature for coupling [310 = 28°C ; Pressure coupling pcoupl = parrinello-rahman pcoupltype = isotropic ; uniform scaling of box vect tau_p = 2.0 ; [ps] time constant ref_p = 1.0 ; [bar] reference pressure for coupling compressibility = 4.5e-5 ; [bar^-1] isothermal compressibility of water refcoord_scaling = com ; have a look at GROMACS documentation 7. ; Velocity generation gen_vel = no ; generate velocities in grompp according to a Maxwell distribution -------- END MDP -------- -------- START STATISTICS -------- P P - P M E L O A D B A L A N C I N G PP/PME load balancing changed the cut-off and PME settings: particle-particle PME rcoulomb rlist grid spacing 1/beta initial 1.000 nm 1.155 nm 100 128 96 0.120 nm 0.320 nm final 1.201 nm 1.356 nm 96 100 80 0.144 nm 0.385 nm cost-ratio 1.62 0.62 (note that these numbers concern only part of the total PP and PME load) M E G A - F L O P S A C C O U N T I N G NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table W3=SPC/TIP3p W4=TIP4p (single or pairs) V&F=Potential and force V=Potential only F=Force only D O M A I N D E C O M P O S I T I O N S T A T I S T I C S av. #atoms communicated per step for force: 2 x 54749.0 av. #atoms communicated per step for LINCS: 2 x 5418.4 Average load imbalance: 12.8 % Part of the total run time spent waiting due to load imbalance: 1.4 % Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Y 0 % R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Th. Count Wall t (s) G-Cycles % ----------------------------------------------------------------------------- Domain decomp. 3 4 625000 10388.307 315806.805 2.3 DD comm. load 3 4 625000 129.908 3949.232 0.0 DD comm. bounds 3 4 625001 267.204 8123.069 0.1 Neighbor search 3 4 625001 7756.651 235803.900 1.7 Launch GPU ops. 3 4 50000002 3376.764 102654.354 0.8 Comm. coord. 3 4 24375000 10651.213 323799.209 2.4 Force 3 4 25000001 35732.146 1086265.102 8.0 Wait + Comm. F 3 4 25000001 19866.403 603943.033 4.5 PME mesh 3 4 25000001 235964.754 7173380.387 53.0 Wait GPU nonlocal 3 4 25000001 12055.970 366504.140 2.7 Wait GPU local 3 4 25000001 106.179 3227.866 0.0 NB X/F buffer ops. 3 4 98750002 10256.750 311807.459 2.3 Write traj. 3 4 2994 249.770 7593.073 0.1 Update 3 4 25000001 33108.852 1006516.379 7.4 Constraints 3 4 25000001 51671.482 1570824.423 11.6 Comm. energies 3 4 2500001 463.135 14079.404 0.1 Rest 3 13290.037 404020.040 3.0 ----------------------------------------------------------------------------- Total 3 445335.526 13538297.876 100.0 ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- PME redist. X/F 3 4 50000002 40747.165 1238722.760 9.1 PME spread/gather 3 4 50000002 122026.128 3709621.109 27.4 PME 3D-FFT 3 4 50000002 46613.023 1417046.140 10.5 PME 3D-FFT Comm. 3 4 50000002 20934.134 636402.285 4.7 PME solve 3 4 25000001 5465.690 166158.163 1.2 ----------------------------------------------------------------------------- Core t (s) Wall t (s) (%) Time: 5317976.200 445335.526 1194.2 5d03h42:15 (ns/day) (hour/ns) Performance: 9.701 2.474 -------- END STATISTICS -------- thank you very much for the help. cheers, Fra -- Francesco Carbone PhD student Institute of Structural and Molecular Biology UCL, London fra.carbone...@ucl.ac.uk -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists