I am running some annealing trials on a Cray XT4. And although the
throughput is impressive, I have severe difficulties with stability of
the code.
For my relatively small system of ~7500 atoms the engine typically crash
after ~500k steps.

I am using the bleeding-edge CVS version: mdrun.c (1.141) (the newest
one after Erik L.'s recent patch of the PME code) 

I configure and compile on the compute nodes exclusively (not the
frontend) and the only compiler warning(s) I get are of the type:

"warning: Using 'getpwuid' in statically linked applications requires 
at runtime the shared libraries from the glibc version used for linking"

After compile though, the code executes and runs for ~20mins, producing
sound data before stalling.

The error logs are very short and quite uniformative.

PBS .o: 
Application 159316 exit codes: 137
Application 159316 exit signals: Killed
Application 159316 resources: utime 0, stime 0
--------------------------------------------------
Begin PBS Epilogue hexagon.bccs.uib.no
Date:             Mon Sep 29 12:32:54 CEST 2008
Job ID:           65643.nid00003
Username:         bjornss
Group:            bjornss
Job Name:         pmf_hydanneal_heatup_400K
Session:          10156
Limits:           walltime=05:00:00
Resources:
cput=00:00:00,mem=4940kb,vmem=22144kb,walltime=00:20:31
Queue:            batch
Account:          fysisk
Base login-node:  login5
End PBS Epilogue  Mon Sep 29 12:32:54 CEST 2008

PBS .err:
_pmii_daemon(SIGCHLD): PE 0 exit signal Killed
[NID 702]Apid 159316: initiated application termination.

As proper electrostatics is crucial to my modeling I am using PME which
comprises a large part of my calculation cost: 35-50%
In the most extreme case, I use the following startup-script

run.pbs:

#!/bin/bash
#PBS -A fysisk
#PBS -N pmf_hydanneal_heatup_400K
#PBS -o pmf_hydanneal.o
#PBS -e pmf.hydanneal.err
#PBS -l walltime=5:00:00,mppwidth=40,mppnppn=4

cd /work/bjornss/pmf/structII/hydrate_annealing/heatup_400K
source $HOME/gmx_latest_290908/bin/GMXRC

aprun -n 40 parmdrun -s topol.tpr -maxh 5 -npme 20
exit $?


Now, apart from a significant reduction in the system dipole moment,
there are no large changes in the system, nor significant translations
of the molecules in the box.

I enclose the md.log and my parameter file. The run-topology (topol.tpr)
can be found at:

http:/drop.io/mdanneal

if anyone wants to try and replicate the crash on their local cluster,
they are welcome.
If after such trials are attempted the error persists, I am willing to
post a bug on bugzilla.


If more information is needed I will try to provide it upon request


Regards and thanks for bothering

-- 
---------------------
Bjørn Steen Saethre 
PhD-student
Theoretical and Energy Physics Unit
Institute of Physics and Technology
Allegt, 41
N-5020 Bergen
Norway

Tel(office) +47 55582869 


Log file opened on Mon Sep 29 14:03:14 2008
Host: nid01054  pid: 8315  nodeid: 0  nnodes:  40
The Gromacs distribution was built Mon Sep 29 13:25:26 CEST 2008 by
[EMAIL PROTECTED] (Linux 2.6.16.54-0.2.5-ss x86_64)


                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                           :-)  VERSION 4.0_rc1  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.

         This program is free software; you can redistribute it and/or
          modify it under the terms of the GNU General Public License
         as published by the Free Software Foundation; either version 2
             of the License, or (at your option) any later version.

                               :-)  parmdrun  (-:


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

parameters of the run:
   integrator           = md
   nsteps               = 2000000
   init_step            = 0
   ns_type              = Grid
   nstlist              = 5
   ndelta               = 2
   nstcomm              = 1
   comm_mode            = Linear
   nstcheckpoint        = 1000
   nstlog               = 100000
   nstxout              = 200000
   nstvout              = 200000
   nstfout              = 200000
   nstenergy            = 100
   nstxtcout            = 1000
   init_t               = 0
   delta_t              = 0.001
   xtcprec              = 1000
   nkx                  = 60
   nky                  = 40
   nkz                  = 40
   pme_order            = 6
   ewald_rtol           = 1e-05
   ewald_geometry       = 0
   epsilon_surface      = 0
   optimize_fft         = TRUE
   ePBC                 = xyz
   bPeriodicMols        = FALSE
   bContinuation        = FALSE
   bShakeSOR            = FALSE
   etc                  = Berendsen
   epc                  = No
   epctype              = Isotropic
   tau_p                = 1
   ref_p (3x3):
      ref_p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref_p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref_p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   compress (3x3):
      compress[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compress[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compress[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord_scaling     = No
   posres_com (3):
      posres_com[0]= 0.00000e+00
      posres_com[1]= 0.00000e+00
      posres_com[2]= 0.00000e+00
   posres_comB (3):
      posres_comB[0]= 0.00000e+00
      posres_comB[1]= 0.00000e+00
      posres_comB[2]= 0.00000e+00
   andersen_seed        = 815131
   rlist                = 0.9
   rtpi                 = 0.05
   coulombtype          = PME
   rcoulomb_switch      = 0
   rcoulomb             = 0.9
   vdwtype              = Cut-off
   rvdw_switch          = 0
   rvdw                 = 0.9
   epsilon_r            = 1
   epsilon_rf           = 1
   tabext               = 1
   implicit_solvent     = No
   gb_algorithm         = Still
   gb_epsilon_solvent   = 80
   nstgbradii           = 1
   rgbradii             = 2
   gb_saltconc          = 0
   gb_obc_alpha         = 1
   gb_obc_beta          = 0.8
   gb_obc_gamma         = 4.85
   sa_surface_tension   = 2.092
   DispCorr             = Ener
   free_energy          = no
   init_lambda          = 0
   sc_alpha             = 0
   sc_power             = 0
   sc_sigma             = 0.3
   delta_lambda         = 0
   nwall                = 0
   wall_type            = 9-3
   wall_atomtype[0]     = -1
   wall_atomtype[1]     = -1
   wall_density[0]      = 0
   wall_density[1]      = 0
   wall_ewald_zfac      = 3
   pull                 = no
   disre                = No
   disre_weighting      = Conservative
   disre_mixed          = FALSE
   dr_fc                = 1000
   dr_tau               = 0
   nstdisreout          = 100
   orires_fc            = 0
   orires_tau           = 0
   nstorireout          = 100
   dihre-fc             = 1000
   em_stepsize          = 0.01
   em_tol               = 10
   niter                = 20
   fc_stepsize          = 0
   nstcgsteep           = 1000
   nbfgscorr            = 10
   ConstAlg             = Lincs
   shake_tol            = 1e-04
   lincs_order          = 6
   lincs_warnangle      = 30
   lincs_iter           = 2
   bd_fric              = 0
   ld_seed              = 1993
   cos_accel            = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   userint1             = 0
   userint2             = 0
   userint3             = 0
   userint4             = 0
   userreal1            = 0
   userreal2            = 0
   userreal3            = 0
   userreal4            = 0
grpopts:
   nrdf:       12957
   ref_t:         400
   tau_t:         0.5
anneal:          No
ann_npoints:           0
   acc:	           0           0           0
   nfreeze:           N           N           N
   energygrp_flags[  0]: 0
   efield-x:
      n = 0
   efield-xt:
      n = 0
   efield-y:
      n = 0
   efield-yt:
      n = 0
   efield-z:
      n = 0
   efield-zt:
      n = 0
   bQMMM                = FALSE
   QMconstraints        = 0
   QMMMscheme           = 0
   scalefactor          = 1
qm_opts:
   ngQM                 = 0

Initializing Domain Decomposition on 40 nodes
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.377 nm
  multi-body bonded interactions: 0.377 nm
Minimum cell size due to bonded interactions: 0.414 nm
Using 20 separate PME nodes
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 20 cells with a minimum initial size of 0.518 nm
The maximum allowed number of cells is: X 12 Y 8 Z 8
Domain decomposition grid 5 x 4 x 1, separate PME nodes 20
Interleaving PP and PME nodes
This is a particle-particle only node

Domain decomposition nodeid 0, coordinates 0 0 0

Using two step summing over 10 groups of on average 2.0 processes

Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Will do PME sum in reciprocal space.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Cut-off's:   NS: 0.9   Coulomb: 0.9   LJ: 0.9
System total charge: -0.000
Generated table with 950 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 950 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 950 data points for LJ12.
Tabscale = 500 points/nm

Enabling TIP4p water optimization for 1632 molecules.

Configuring nonbonded kernels...
Testing x86_64 SSE support... present.


Removing pbc first time

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------


Linking all bonded interactions to atoms
There are 3744 inter charge-group exclusions,
will use an extra communication step for exclusion forces for PME

The initial number of communication pulses is: X 1 Y 1
The initial domain decomposition cell size is: X 1.24 nm Y 1.04 nm

The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           0.900 nm
            two-body bonded interactions  (-rdd)   0.900 nm
          multi-body bonded interactions  (-rdd)   0.900 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 2 Y 2
The minimum size for domain decomposition cells is 0.707 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.57 Y 0.68
The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           0.900 nm
            two-body bonded interactions  (-rdd)   0.900 nm
          multi-body bonded interactions  (-rdd)   0.707 nm


Making 2D domain decomposition grid 5 x 4 x 1, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------

title                    = heatup 400K  structII - propan -  tip4p/ice(rigid) - 
PME
cpp                      = /lib/cpp
integrator               = md
define                   =-DPOSRES
include                  = -I/home/fi/bjornss/mytop

;Run ctrl
dt                       = 0.001
nsteps                   = 2000000
nstxout                  = 200000
nstvout                  = 200000
nstfout                  = 200000
nstenergy                = 100
nstlog                   = 100000
nstxtcout                = 1000


;Electrostatics/Neigboursearch
nstlist                  = 5
ns_type                  = grid
rlist                    = 0.9
coulombtype              = PME
ewald_geometry           = 3d
rcoulomb                 = 0.9
vdw-type                 = Cut-off
rvdw                     = 0.9
optimize_fft             = yes
fourier_nx               = 60
fourier_ny               = 40
fourier_nz               = 40
pme_order                = 6

;Boundary conditions/constraints etc,
pbc                      = xyz
DispCorr                 = Ener
constraints              = hbonds
constraint_algorithm     = lincs
lincs_iter               = 2
lincs_order              = 6
;nwall                   = 0
;walltype                = 9-3
;wall_r_linpot           = -10
;wall_atomtype           = opls_113 opls_113
;wall_density            = 4.6 4.6 
;wall_ewald_zfac         = 2.4




;Temperature and pressure generation and coupling
gen_vel                  = no
;gen_temp                = 350
;gen_seed                = -1

tcoupl                   = berendsen
tc_grps                  = System
tau_t                    = 0.5
ref_t                    = 400

pcoupl                   = no
;pcoupltype              = isotropic
;tau_p                   = 2 
;ref_p                   = 10
;compressibility         = 5e-6

unconstrained-start      = no
_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to