Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
Am 27.01.2011 um 16:10 schrieb Joshua Hursey: > > On Jan 27, 2011, at 9:47 AM, Reuti wrote: > >> Am 27.01.2011 um 15:23 schrieb Joshua Hursey: >> >>> The current version of Open MPI does not support continued operation of an >>> MPI application after process failure within a job. If a process

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey
On Jan 27, 2011, at 9:47 AM, Reuti wrote: > Am 27.01.2011 um 15:23 schrieb Joshua Hursey: > >> The current version of Open MPI does not support continued operation of an >> MPI application after process failure within a job. If a process dies, so >> will the MPI job. Note that this is true of

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Ralph Castain
On Jan 27, 2011, at 7:47 AM, Reuti wrote: > Am 27.01.2011 um 15:23 schrieb Joshua Hursey: > >> The current version of Open MPI does not support continued operation of an >> MPI application after process failure within a job. If a process dies, so >> will the MPI job. Note that this is true of

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
Am 27.01.2011 um 15:23 schrieb Joshua Hursey: > The current version of Open MPI does not support continued operation of an > MPI application after process failure within a job. If a process dies, so > will the MPI job. Note that this is true of many MPI implementations out > there at the moment

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey
The current version of Open MPI does not support continued operation of an MPI application after process failure within a job. If a process dies, so will the MPI job. Note that this is true of many MPI implementations out there at the moment. At Oak Ridge National Laboratory, we are working on

[OMPI users] allow job to survive process death

2011-01-27 Thread Kirk Stako
Hi, I was wondering what support Open MPI has for allowing a job to continue running when one or more processes in the job die unexpectedly? Is there a special mpirun flag for this? Any other ways? It seems obvious that collectives will fail once a process dies, but would it be possible to create