Thank you Carsten Will surely try out the suggestions and get back to you
On Thu, Sep 19, 2013 at 1:52 PM, Carsten Kutzner <ckut...@gwdg.de> wrote: > Hi, > > make a scaling test and run on a single node only at first. So you can > estimate what performance you can at most expect when going to more nodes. > > On a single node, you can also run with Gromacs' thread-MPI, thus > eliminating the possibility that something with your MPI is wrong. > > There are lots of reasons why your parallel performance could be bad. > Can you check that actually the Infiniband interconnect is used and > not the Ethernet? It could also be that a single process is still > running on any of your cores and eating up CPU time. Or maybe the > pinning of threads to cores is not correct (what does md.log say > about that?). > > Just a few ideas. > > Good luck! > > Carsten > > > On Sep 19, 2013, at 8:07 AM, ashutosh srivastava <ashu4...@gmail.com> > wrote: > > > Hi > > > > I have been trying to run simulation on a cluster consisting of 24 nodes > > Intel(R) Xeon(R) CPU X5670 @ 2.93GHz. Each node has 12 processors and > they > > are connected via 1Gbit Ethernet and Infiniband interconnect. The batch > > system is TORQUE. However due to some issues with the parallel queue I > have > > been trying to run the simulations directly on the cluster using mpdboot > > and mpirun. > > Following is the mdp.out file that I am using for simulation > > ; VARIOUS PREPROCESSING OPTIONS > > ; Preprocessor information: use cpp syntax. > > ; e.g.: -I/home/joe/doe -I/home/mary/roe > > include = > > ; e.g.: -DPOSRES -DFLEXIBLE (note these variable names are case > sensitive) > > define = -DPOSRES > > > > ; RUN CONTROL PARAMETERS > > integrator = md > > ; Start time and timestep in ps > > tinit = 0 > > dt = 0.002 > > nsteps = 250000 > > ; For exact run continuation or redoing part of a run > > init-step = 0 > > ; Part index is updated automatically on checkpointing (keeps files > > separate) > > simulation-part = 1 > > ; mode for center of mass motion removal > > comm-mode = Linear > > ; number of steps for center of mass motion removal > > nstcomm = 100 > > ; group(s) for center of mass motion removal > > comm-grps = > > > > ; LANGEVIN DYNAMICS OPTIONS > > ; Friction coefficient (amu/ps) and random seed > > bd-fric = 0 > > ld-seed = 1993 > > > > ; ENERGY MINIMIZATION OPTIONS > > ; Force tolerance and initial step-size > > emtol = 10 > > emstep = 0.01 > > ; Max number of iterations in relax-shells > > niter = 20 > > ; Step size (ps^2) for minimization of flexible constraints > > fcstep = 0 > > ; Frequency of steepest descents steps when doing CG > > nstcgsteep = 1000 > > nbfgscorr = 10 > > > > ; TEST PARTICLE INSERTION OPTIONS > > rtpi = 0.05 > > > > ; OUTPUT CONTROL OPTIONS > > ; Output frequency for coords (x), velocities (v) and forces (f) > > nstxout = 100 > > nstvout = 100 > > nstfout = 0 > > ; Output frequency for energies to log file and energy file > > nstlog = 100 > > nstcalcenergy = 100 > > nstenergy = 100 > > ; Output frequency and precision for .xtc file > > nstxtcout = 0 > > xtc-precision = 1000 > > ; This selects the subset of atoms for the .xtc file. You can > > ; select multiple groups. By default all atoms will be written. > > xtc-grps = > > ; Selection of energy groups > > energygrps = > > > > ; NEIGHBORSEARCHING PARAMETERS > > ; cut-off scheme (group: using charge groups, Verlet: particle based > > cut-offs) > > cutoff-scheme = Group > > ; nblist update frequency > > nstlist = 5 > > ; ns algorithm (simple or grid) > > ns_type = grid > > ; Periodic boundary conditions: xyz, no, xy > > pbc = xyz > > periodic-molecules = no > > ; Allowed energy drift due to the Verlet buffer in kJ/mol/ps per atom, > > ; a value of -1 means: use rlist > > verlet-buffer-drift = 0.005 > > ; nblist cut-off > > rlist = 1.0 > > ; long-range cut-off for switched potentials > > rlistlong = -1 > > nstcalclr = -1 > > > > ; OPTIONS FOR ELECTROSTATICS AND VDW > > ; Method for doing electrostatics > > coulombtype = PME > > coulomb-modifier = Potential-shift-Verlet > > rcoulomb-switch = 0 > > rcoulomb = 1.0 > > ; Relative dielectric constant for the medium and the reaction field > > epsilon-r = 1 > > epsilon-rf = 0 > > ; Method for doing Van der Waals > > vdw-type = Cut-off > > vdw-modifier = Potential-shift-Verlet > > ; cut-off lengths > > rvdw-switch = 0 > > rvdw = 1.0 > > ; Apply long range dispersion corrections for Energy and Pressure > > DispCorr = EnerPres > > ; Extension of the potential lookup tables beyond the cut-off > > table-extension = 1 > > ; Separate tables between energy group pairs > > energygrp-table = > > ; Spacing for the PME/PPPM FFT grid > > fourierspacing = 0.16 > > ; FFT grid size, when a value is 0 fourierspacing will be used > > fourier-nx = 0 > > fourier-ny = 0 > > fourier-nz = 0 > > ; EWALD/PME/PPPM parameters > > pme_order = 4 > > ewald-rtol = 1e-05 > > ewald-geometry = 3d > > epsilon-surface = 0 > > optimize-fft = no > > > > ; IMPLICIT SOLVENT ALGORITHM > > implicit-solvent = No > > > > ; GENERALIZED BORN ELECTROSTATICS > > ; Algorithm for calculating Born radii > > gb-algorithm = Still > > ; Frequency of calculating the Born radii inside rlist > > nstgbradii = 1 > > ; Cutoff for Born radii calculation; the contribution from atoms > > ; between rlist and rgbradii is updated every nstlist steps > > rgbradii = 1 > > ; Dielectric coefficient of the implicit solvent > > gb-epsilon-solvent = 80 > > ; Salt concentration in M for Generalized Born models > > gb-saltconc = 0 > > ; Scaling factors used in the OBC GB model. Default values are OBC(II) > > gb-obc-alpha = 1 > > gb-obc-beta = 0.8 > > gb-obc-gamma = 4.85 > > gb-dielectric-offset = 0.009 > > sa-algorithm = Ace-approximation > > ; Surface tension (kJ/mol/nm^2) for the SA (nonpolar surface) part of > GBSA > > ; The value -1 will set default value for Still/HCT/OBC GB-models. > > sa-surface-tension = -1 > > > > ; OPTIONS FOR WEAK COUPLING ALGORITHMS > > ; Temperature coupling > > tcoupl = V-rescale > > nsttcouple = -1 > > nh-chain-length = 10 > > print-nose-hoover-chain-variables = no > > ; Groups to couple separately > > tc-grps = Protein Non-Protein > > ; Time constant (ps) and reference temperature (K) > > tau_t = 0.1 0.1 > > ref_t = 300 300 > > ; pressure coupling > > pcoupl = no > > pcoupltype = Isotropic > > nstpcouple = -1 > > ; Time constant (ps), compressibility (1/bar) and reference P (bar) > > tau-p = 1 > > compressibility = > > ref-p = > > ; Scaling of reference coordinates, No, All or COM > > refcoord-scaling = No > > > > ; OPTIONS FOR QMMM calculations > > QMMM = no > > ; Groups treated Quantum Mechanically > > QMMM-grps = > > ; QM method > > QMmethod = > > ; QMMM scheme > > QMMMscheme = normal > > ; QM basisset > > QMbasis = > > ; QM charge > > QMcharge = > > ; QM multiplicity > > QMmult = > > ; Surface Hopping > > SH = > > ; CAS space options > > CASorbitals = > > CASelectrons = > > SAon = > > SAoff = > > SAsteps = > > ; Scale factor for MM charges > > MMChargeScaleFactor = 1 > > ; Optimization of QM subsystem > > bOPT = > > bTS = > > > > ; SIMULATED ANNEALING > > ; Type of annealing for each temperature group (no/single/periodic) > > annealing = > > ; Number of time points to use for specifying annealing in each group > > annealing-npoints = > > ; List of times at the annealing points for each group > > annealing-time = > > ; Temp. at each annealing point, for each group. > > annealing-temp = > > > > ; GENERATE VELOCITIES FOR STARTUP RUN > > gen_vel = yes > > gen_temp = 300 > > gen_seed = -1 > > > > ; OPTIONS FOR BONDS > > constraints = all-bonds > > ; Type of constraint algorithm > > constraint_algorithm = lincs > > ; Do not constrain the start configuration > > continuation = no > > ; Use successive overrelaxation to reduce the number of shake iterations > > Shake-SOR = no > > ; Relative tolerance of shake > > shake-tol = 0.0001 > > ; Highest order in the expansion of the constraint coupling matrix > > lincs_order = 4 > > ; Number of iterations in the final step of LINCS. 1 is fine for > > ; normal simulations, but use 2 to conserve energy in NVE runs. > > ; For energy minimization with constraints it should be 4 to 8. > > lincs_iter = 1 > > ; Lincs will write a warning to the stderr if in one step a bond > > ; rotates over more degrees than > > lincs-warnangle = 30 > > ; Convert harmonic bonds to morse potentials > > morse = no > > > > ; ENERGY GROUP EXCLUSIONS > > ; Pairs of energy groups for which all non-bonded interactions are > excluded > > energygrp-excl = > > > > ; WALLS > > ; Number of walls, type, atom types, densities and box-z scale factor for > > Ewald > > nwall = 0 > > wall-type = 9-3 > > wall-r-linpot = -1 > > wall-atomtype = > > wall-density = > > wall-ewald-zfac = 3 > > > > ; COM PULLING > > ; Pull type: no, umbrella, constraint or constant-force > > pull = no > > > > ; ENFORCED ROTATION > > ; Enforced rotation: No or Yes > > rotation = no > > > > ; NMR refinement stuff > > ; Distance restraints type: No, Simple or Ensemble > > disre = No > > ; Force weighting of pairs in one distance restraint: Conservative or > Equal > > disre-weighting = Conservative > > ; Use sqrt of the time averaged times the instantaneous violation > > disre-mixed = no > > disre-fc = 1000 > > disre-tau = 0 > > ; Output frequency for pair distances to energy file > > nstdisreout = 100 > > ; Orientation restraints: No or Yes > > orire = no > > ; Orientation restraints force constant and tau for time averaging > > orire-fc = 0 > > orire-tau = 0 > > orire-fitgrp = > > ; Output frequency for trace(SD) and S to energy file > > nstorireout = 100 > > > > ; Free energy variables > > free-energy = no > > couple-moltype = > > couple-lambda0 = vdw-q > > couple-lambda1 = vdw-q > > couple-intramol = no > > init-lambda = -1 > > init-lambda-state = -1 > > delta-lambda = 0 > > nstdhdl = 50 > > fep-lambdas = > > mass-lambdas = > > coul-lambdas = > > vdw-lambdas = > > bonded-lambdas = > > restraint-lambdas = > > temperature-lambdas = > > calc-lambda-neighbors = 1 > > init-lambda-weights = > > dhdl-print-energy = no > > sc-alpha = 0 > > sc-power = 1 > > sc-r-power = 6 > > sc-sigma = 0.3 > > sc-coul = no > > separate-dhdl-file = yes > > dhdl-derivatives = yes > > dh_hist_size = 0 > > dh_hist_spacing = 0.1 > > > > ; Non-equilibrium MD stuff > > acc-grps = > > accelerate = > > freezegrps = > > freezedim = > > cos-acceleration = 0 > > deform = > > > > ; simulated tempering variables > > simulated-tempering = no > > simulated-tempering-scaling = geometric > > sim-temp-low = 300 > > sim-temp-high = 300 > > > > ; Electric fields > > ; Format is number of terms (int) and for all terms an amplitude (real) > > ; and a phase angle (real) > > E-x = > > E-xt = > > E-y = > > E-yt = > > E-z = > > E-zt = > > > > ; AdResS parameters > > adress = no > > > > ; User defined thingies > > user1-grps = > > user2-grps = > > userint1 = 0 > > userint2 = 0 > > userint3 = 0 > > userint4 = 0 > > userreal1 = 0 > > userreal2 = 0 > > userreal3 = 0 > > userreal4 = 0 > > > > > > The system has 250853 atoms. I used g_tune_pme in order to check the > > performance with different number of processors > > Following are the perf.out for 48 and 160 processors respectively > > > > Summary of successful runs: > > Line tpr PME nodes Gcycles Av. Std.dev. ns/day PME/f > > DD grid > > 0 0 8 181.713 7.698 0.952 1.334 > > 8 5 1 > > 1 0 6 156.720 4.086 1.104 1.420 > > 6 7 1 > > 2 0 4 196.320 16.161 0.885 0.916 > > 4 11 1 > > 3 0 3 195.312 1.127 0.886 0.840 > > 3 5 3 > > 4 0 0 370.539 12.942 0.468 - > > 8 6 1 > > 5 0 -1( 8) 185.688 0.839 0.932 1.322 > > 8 5 1 > > 6 1 8 185.651 14.798 0.934 1.294 > > 8 5 1 > > 7 1 6 155.970 3.320 1.110 1.157 > > 6 7 1 > > 8 1 4 177.021 15.459 0.980 1.005 > > 4 11 1 > > 9 1 3 190.704 22.673 0.914 0.931 > > 3 5 3 > > 10 1 0 293.676 5.460 0.589 - > > 8 6 1 > > 11 1 -1( 8) 188.978 3.686 0.915 1.266 > > 8 5 1 > > 12 2 8 210.631 17.457 0.824 1.176 > > 8 5 1 > > 13 2 6 171.926 10.462 1.008 1.186 > > 6 7 1 > > 14 2 4 200.015 6.696 0.865 0.839 > > 4 11 1 > > 15 2 3 215.013 5.881 0.804 0.863 > > 3 5 3 > > 16 2 0 298.363 7.187 0.580 - > > 8 6 1 > > 17 2 -1( 8) 208.821 34.409 0.840 1.088 > > 8 5 1 > > > > ------------------------------------------------------------ > > Best performance was achieved with 6 PME nodes (see line 7) > > Optimized PME settings: > > New Coulomb radius: 1.100000 nm (was 1.000000 nm) > > New Van der Waals radius: 1.100000 nm (was 1.000000 nm) > > New Fourier grid xyz: 80 80 80 (was 96 96 96) > > Please use this command line to launch the simulation: > > > > mpirun -np 48 mdrun_mpi -npme 6 -s tuned.tpr -pin on > > > > > > Summary of successful runs: > > Line tpr PME nodes Gcycles Av. Std.dev. ns/day PME/f > > DD grid > > 0 0 25 283.628 2.191 0.610 1.749 > > 5 9 3 > > 1 0 20 240.888 9.132 0.719 1.618 > > 5 4 7 > > 2 0 16 166.570 0.394 1.038 1.239 > > 8 6 3 > > 3 0 0 435.389 3.399 0.397 - > > 10 8 2 > > 4 0 -1( 20) 237.623 6.298 0.729 1.406 > > 5 4 7 > > 5 1 25 286.990 1.662 0.603 1.813 > > 5 9 3 > > 6 1 20 235.818 0.754 0.734 1.495 > > 5 4 7 > > 7 1 16 167.888 3.028 1.030 1.256 > > 8 6 3 > > 8 1 0 284.264 3.775 0.609 - > > 8 5 4 > > 9 1 -1( 16) 167.858 1.924 1.030 1.303 > > 8 6 3 > > 10 2 25 298.637 1.660 0.579 1.696 > > 5 9 3 > > 11 2 20 281.647 1.074 0.614 1.296 > > 5 4 7 > > 12 2 16 184.012 4.022 0.941 1.244 > > 8 6 3 > > 13 2 0 304.658 0.793 0.568 - > > 8 5 4 > > 14 2 -1( 16) 183.084 2.203 0.945 1.188 > > 8 6 3 > > > > ------------------------------------------------------------ > > Best performance was achieved with 16 PME nodes (see line 2) > > and original PME settings. > > Please use this command line to launch the simulation: > > > > mpirun -np 160 /data1/shashi/localbin/gromacs/bin/mdrun_mpi -npme 16 -s > > 4icl.tpr -pin on > > > > > > Both of these outcomes(1.110ns/day and 1.038ns/day) are lower than what I > > get on my workstation with Xeon W3550 3.07 GHz using 8 thread > (1.431ns/day) > > for a similar system. > > The bench.log file generated by g_tune PME shows very high load imbalance > > (>60% -100 %). I have tried several combinations of np and npme but the > > perfomance is always in this range only. > > Can someone please tell me what is it that I am doing wrong or how can I > > decrease the simulation time. > > -- > > Regards > > Ashutosh Srivastava > > -- > > gmx-users mailing list gmx-users@gromacs.org > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > * Please don't post (un)subscribe requests to the list. Use the > > www interface or send it to gmx-users-requ...@gromacs.org. > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > -- > Dr. Carsten Kutzner > Max Planck Institute for Biophysical Chemistry > Theoretical and Computational Biophysics > Am Fassberg 11, 37077 Goettingen, Germany > Tel. +49-551-2012313, Fax: +49-551-2012302 > http://www.mpibpc.mpg.de/grubmueller/kutzner > http://www.mpibpc.mpg.de/grubmueller/sppexa > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- Regards Ashutosh Srivastava -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists