This is not OpenMPI specific - but maybe somebody on the list can give a
hint.

I start a parallel job with:
mpirun -np 19 -nolocal -machinefile machinefile bin/getm_prod_IFORT.0096x0096

everything starts OK and the simulation carries on 2+ hours of
wall clock time - then suddenly without a trace in the logfile:

    19:48:46.172 n=        1800
            2003-09-01 05:06:00: reading 2D boundary data ...
    19:49:21.710 n=        1900
    19:49:50.490 n=        2000

or in any system logfiles the simulation stops and all related processes
on the nodes stops.

If I re-run the simulation does not stop at the same time.

Does anybody have a clue where I shall search.

I use a 4 machine/dual P/dual core cluster connected via GBit/s ethernet.

Karsten

PS: If I use MPICH I get the same problem.


-- 
----------------------------------------------------------------------
Karsten Bolding                    Bolding & Burchard Hydrodynamics
Strandgyden 25                     Phone: +45 64422058
DK-5466 Asperup                    Fax:   +45 64422068
Denmark                            Email: kars...@bolding-burchard.com

http://www.findvej.dk/Strandgyden25,5466,11,3
----------------------------------------------------------------------

Reply via email to