This is not OpenMPI specific - but maybe somebody on the list can give a hint.
I start a parallel job with: mpirun -np 19 -nolocal -machinefile machinefile bin/getm_prod_IFORT.0096x0096 everything starts OK and the simulation carries on 2+ hours of wall clock time - then suddenly without a trace in the logfile: 19:48:46.172 n= 1800 2003-09-01 05:06:00: reading 2D boundary data ... 19:49:21.710 n= 1900 19:49:50.490 n= 2000 or in any system logfiles the simulation stops and all related processes on the nodes stops. If I re-run the simulation does not stop at the same time. Does anybody have a clue where I shall search. I use a 4 machine/dual P/dual core cluster connected via GBit/s ethernet. Karsten PS: If I use MPICH I get the same problem. -- ---------------------------------------------------------------------- Karsten Bolding Bolding & Burchard Hydrodynamics Strandgyden 25 Phone: +45 64422058 DK-5466 Asperup Fax: +45 64422068 Denmark Email: kars...@bolding-burchard.com http://www.findvej.dk/Strandgyden25,5466,11,3 ----------------------------------------------------------------------