I am running some annealing trials on a Cray XT4. And although the throughput is impressive, I have severe difficulties with stability of the code. For my relatively small system of ~7500 atoms the engine typically crash after ~500k steps.
I am using the bleeding-edge CVS version: mdrun.c (1.141) (the newest one after Erik L.'s recent patch of the PME code) I configure and compile on the compute nodes exclusively (not the frontend) and the only compiler warning(s) I get are of the type: "warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking" After compile though, the code executes and runs for ~20mins, producing sound data before stalling. The error logs are very short and quite uniformative. PBS .o: Application 159316 exit codes: 137 Application 159316 exit signals: Killed Application 159316 resources: utime 0, stime 0 -------------------------------------------------- Begin PBS Epilogue hexagon.bccs.uib.no Date: Mon Sep 29 12:32:54 CEST 2008 Job ID: 65643.nid00003 Username: bjornss Group: bjornss Job Name: pmf_hydanneal_heatup_400K Session: 10156 Limits: walltime=05:00:00 Resources: cput=00:00:00,mem=4940kb,vmem=22144kb,walltime=00:20:31 Queue: batch Account: fysisk Base login-node: login5 End PBS Epilogue Mon Sep 29 12:32:54 CEST 2008 PBS .err: _pmii_daemon(SIGCHLD): PE 0 exit signal Killed [NID 702]Apid 159316: initiated application termination. As proper electrostatics is crucial to my modeling I am using PME which comprises a large part of my calculation cost: 35-50% In the most extreme case, I use the following startup-script run.pbs: #!/bin/bash #PBS -A fysisk #PBS -N pmf_hydanneal_heatup_400K #PBS -o pmf_hydanneal.o #PBS -e pmf.hydanneal.err #PBS -l walltime=5:00:00,mppwidth=40,mppnppn=4 cd /work/bjornss/pmf/structII/hydrate_annealing/heatup_400K source $HOME/gmx_latest_290908/bin/GMXRC aprun -n 40 parmdrun -s topol.tpr -maxh 5 -npme 20 exit $? Now, apart from a significant reduction in the system dipole moment, there are no large changes in the system, nor significant translations of the molecules in the box. I enclose the md.log and my parameter file. The run-topology (topol.tpr) can be found at: http:/drop.io/mdanneal if anyone wants to try and replicate the crash on their local cluster, they are welcome. If after such trials are attempted the error persists, I am willing to post a bug on bugzilla. If more information is needed I will try to provide it upon request Regards and thanks for bothering -- --------------------- Bjørn Steen Saethre PhD-student Theoretical and Energy Physics Unit Institute of Physics and Technology Allegt, 41 N-5020 Bergen Norway Tel(office) +47 55582869
Log file opened on Mon Sep 29 14:03:14 2008 Host: nid01054 pid: 8315 nodeid: 0 nnodes: 40 The Gromacs distribution was built Mon Sep 29 13:25:26 CEST 2008 by [EMAIL PROTECTED] (Linux 2.6.16.54-0.2.5-ss x86_64) :-) G R O M A C S (-: Groningen Machine for Chemical Simulation :-) VERSION 4.0_rc1 (-: Written by David van der Spoel, Erik Lindahl, Berk Hess, and others. Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2008, The GROMACS development team, check out http://www.gromacs.org for more information. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. :-) parmdrun (-: ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation J. Chem. Theory Comput. 4 (2008) pp. 435-447 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. Berendsen GROMACS: Fast, Flexible and Free J. Comp. Chem. 26 (2005) pp. 1701-1719 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ E. Lindahl and B. Hess and D. van der Spoel GROMACS 3.0: A package for molecular simulation and trajectory analysis J. Mol. Mod. 7 (2001) pp. 306-317 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ H. J. C. Berendsen, D. van der Spoel and R. van Drunen GROMACS: A message-passing parallel molecular dynamics implementation Comp. Phys. Comm. 91 (1995) pp. 43-56 -------- -------- --- Thank You --- -------- -------- parameters of the run: integrator = md nsteps = 2000000 init_step = 0 ns_type = Grid nstlist = 5 ndelta = 2 nstcomm = 1 comm_mode = Linear nstcheckpoint = 1000 nstlog = 100000 nstxout = 200000 nstvout = 200000 nstfout = 200000 nstenergy = 100 nstxtcout = 1000 init_t = 0 delta_t = 0.001 xtcprec = 1000 nkx = 60 nky = 40 nkz = 40 pme_order = 6 ewald_rtol = 1e-05 ewald_geometry = 0 epsilon_surface = 0 optimize_fft = TRUE ePBC = xyz bPeriodicMols = FALSE bContinuation = FALSE bShakeSOR = FALSE etc = Berendsen epc = No epctype = Isotropic tau_p = 1 ref_p (3x3): ref_p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} ref_p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} ref_p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} compress (3x3): compress[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} compress[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} compress[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} refcoord_scaling = No posres_com (3): posres_com[0]= 0.00000e+00 posres_com[1]= 0.00000e+00 posres_com[2]= 0.00000e+00 posres_comB (3): posres_comB[0]= 0.00000e+00 posres_comB[1]= 0.00000e+00 posres_comB[2]= 0.00000e+00 andersen_seed = 815131 rlist = 0.9 rtpi = 0.05 coulombtype = PME rcoulomb_switch = 0 rcoulomb = 0.9 vdwtype = Cut-off rvdw_switch = 0 rvdw = 0.9 epsilon_r = 1 epsilon_rf = 1 tabext = 1 implicit_solvent = No gb_algorithm = Still gb_epsilon_solvent = 80 nstgbradii = 1 rgbradii = 2 gb_saltconc = 0 gb_obc_alpha = 1 gb_obc_beta = 0.8 gb_obc_gamma = 4.85 sa_surface_tension = 2.092 DispCorr = Ener free_energy = no init_lambda = 0 sc_alpha = 0 sc_power = 0 sc_sigma = 0.3 delta_lambda = 0 nwall = 0 wall_type = 9-3 wall_atomtype[0] = -1 wall_atomtype[1] = -1 wall_density[0] = 0 wall_density[1] = 0 wall_ewald_zfac = 3 pull = no disre = No disre_weighting = Conservative disre_mixed = FALSE dr_fc = 1000 dr_tau = 0 nstdisreout = 100 orires_fc = 0 orires_tau = 0 nstorireout = 100 dihre-fc = 1000 em_stepsize = 0.01 em_tol = 10 niter = 20 fc_stepsize = 0 nstcgsteep = 1000 nbfgscorr = 10 ConstAlg = Lincs shake_tol = 1e-04 lincs_order = 6 lincs_warnangle = 30 lincs_iter = 2 bd_fric = 0 ld_seed = 1993 cos_accel = 0 deform (3x3): deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} userint1 = 0 userint2 = 0 userint3 = 0 userint4 = 0 userreal1 = 0 userreal2 = 0 userreal3 = 0 userreal4 = 0 grpopts: nrdf: 12957 ref_t: 400 tau_t: 0.5 anneal: No ann_npoints: 0 acc: 0 0 0 nfreeze: N N N energygrp_flags[ 0]: 0 efield-x: n = 0 efield-xt: n = 0 efield-y: n = 0 efield-yt: n = 0 efield-z: n = 0 efield-zt: n = 0 bQMMM = FALSE QMconstraints = 0 QMMMscheme = 0 scalefactor = 1 qm_opts: ngQM = 0 Initializing Domain Decomposition on 40 nodes Dynamic load balancing: auto Will sort the charge groups at every domain (re)decomposition Initial maximum inter charge-group distances: two-body bonded interactions: 0.377 nm multi-body bonded interactions: 0.377 nm Minimum cell size due to bonded interactions: 0.414 nm Using 20 separate PME nodes Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25 Optimizing the DD grid for 20 cells with a minimum initial size of 0.518 nm The maximum allowed number of cells is: X 12 Y 8 Z 8 Domain decomposition grid 5 x 4 x 1, separate PME nodes 20 Interleaving PP and PME nodes This is a particle-particle only node Domain decomposition nodeid 0, coordinates 0 0 0 Using two step summing over 10 groups of on average 2.0 processes Table routines are used for coulomb: TRUE Table routines are used for vdw: FALSE Will do PME sum in reciprocal space. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen A smooth particle mesh Ewald method J. Chem. Phys. 103 (1995) pp. 8577-8592 -------- -------- --- Thank You --- -------- -------- Using a Gaussian width (1/beta) of 0.288146 nm for Ewald Cut-off's: NS: 0.9 Coulomb: 0.9 LJ: 0.9 System total charge: -0.000 Generated table with 950 data points for Ewald. Tabscale = 500 points/nm Generated table with 950 data points for LJ6. Tabscale = 500 points/nm Generated table with 950 data points for LJ12. Tabscale = 500 points/nm Enabling TIP4p water optimization for 1632 molecules. Configuring nonbonded kernels... Testing x86_64 SSE support... present. Removing pbc first time ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Miyamoto and P. A. Kollman SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid Water Models J. Comp. Chem. 13 (1992) pp. 952-962 -------- -------- --- Thank You --- -------- -------- Linking all bonded interactions to atoms There are 3744 inter charge-group exclusions, will use an extra communication step for exclusion forces for PME The initial number of communication pulses is: X 1 Y 1 The initial domain decomposition cell size is: X 1.24 nm Y 1.04 nm The maximum allowed distance for charge groups involved in interactions is: non-bonded interactions 0.900 nm two-body bonded interactions (-rdd) 0.900 nm multi-body bonded interactions (-rdd) 0.900 nm When dynamic load balancing gets turned on, these settings will change to: The maximum number of communication pulses is: X 2 Y 2 The minimum size for domain decomposition cells is 0.707 nm The requested allowed shrink of DD cells (option -dds) is: 0.80 The allowed shrink of domain decomposition cells is: X 0.57 Y 0.68 The maximum allowed distance for charge groups involved in interactions is: non-bonded interactions 0.900 nm two-body bonded interactions (-rdd) 0.900 nm multi-body bonded interactions (-rdd) 0.707 nm Making 2D domain decomposition grid 5 x 4 x 1, home cell index 0 0 0 Center of mass motion removal mode is Linear We have the following groups for center of mass motion removal: 0: rest ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak Molecular dynamics with coupling to an external bath J. Chem. Phys. 81 (1984) pp. 3684-3690 -------- -------- --- Thank You --- -------- --------
title = heatup 400K structII - propan - tip4p/ice(rigid) - PME cpp = /lib/cpp integrator = md define =-DPOSRES include = -I/home/fi/bjornss/mytop ;Run ctrl dt = 0.001 nsteps = 2000000 nstxout = 200000 nstvout = 200000 nstfout = 200000 nstenergy = 100 nstlog = 100000 nstxtcout = 1000 ;Electrostatics/Neigboursearch nstlist = 5 ns_type = grid rlist = 0.9 coulombtype = PME ewald_geometry = 3d rcoulomb = 0.9 vdw-type = Cut-off rvdw = 0.9 optimize_fft = yes fourier_nx = 60 fourier_ny = 40 fourier_nz = 40 pme_order = 6 ;Boundary conditions/constraints etc, pbc = xyz DispCorr = Ener constraints = hbonds constraint_algorithm = lincs lincs_iter = 2 lincs_order = 6 ;nwall = 0 ;walltype = 9-3 ;wall_r_linpot = -10 ;wall_atomtype = opls_113 opls_113 ;wall_density = 4.6 4.6 ;wall_ewald_zfac = 2.4 ;Temperature and pressure generation and coupling gen_vel = no ;gen_temp = 350 ;gen_seed = -1 tcoupl = berendsen tc_grps = System tau_t = 0.5 ref_t = 400 pcoupl = no ;pcoupltype = isotropic ;tau_p = 2 ;ref_p = 10 ;compressibility = 5e-6 unconstrained-start = no
_______________________________________________ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php