Re: [OMPI users] Slightly OT: mpi job terminates

2007-11-06 Thread Jeff Squyres
You might want to run your app through a memory-checking debugger to see if anything obvious shows up. Also, check to see if your corelimit size is greater than zero (i.e., make it "unlimited"). Then run again and see if you can get corefiles to see if your app is silently dumping core, an

[OMPI users] Slightly OT: mpi job terminates

2007-11-01 Thread Karsten Bolding
This is not OpenMPI specific - but maybe somebody on the list can give a hint. I start a parallel job with: mpirun -np 19 -nolocal -machinefile machinefile bin/getm_prod_IFORT.0096x0096 everything starts OK and the simulation carries on 2+ hours of wall clock time - then suddenly without a trace