Re: [OMPI users] Question on MPMD runs

2013-05-30 Thread George Bosilca
Victor you might want to take a look at the Open MPI version available from http://fault-tolerance.org/. It provides additional features to graciously handle node failures. George. On May 30, 2013, at 17:55 , Victor Vysotskiy wrote: > Hi Ralph, > >> -mca orte_abort_non_zero_exit 0 > >

Re: [OMPI users] Question on MPMD runs

2013-05-30 Thread Ralph Castain
On May 30, 2013, at 8:55 AM, Victor Vysotskiy wrote: > Hi Ralph, > >> -mca orte_abort_non_zero_exit 0 > > Thank you for the hint. That it is exactly what I need! BTW, does it help if > one of the working node occasionally dies during the MPMD run? I'm afraid not - failure of a node is a te

Re: [OMPI users] Question on MPMD runs

2013-05-30 Thread Victor Vysotskiy
Hi Ralph, > -mca orte_abort_non_zero_exit 0 Thank you for the hint. That it is exactly what I need! BTW, does it help if one of the working node occasionally dies during the MPMD run? With best regards, Victor.

Re: [OMPI users] Question on MPMD runs

2013-05-30 Thread Ralph Castain
There is such an option in the 1.7 series and on the trunk, but I don't see it in v1.6. -mca orte_abort_non_zero_exit 0 On May 30, 2013, at 3:40 AM, Victor Vysotskiy wrote: > Dear OpenMPI Developers and Users, > > I have general question on signal trapping/handling within mpiexec/mpirun.

[OMPI users] Question on MPMD runs

2013-05-30 Thread Victor Vysotskiy
Dear OpenMPI Developers and Users, I have general question on signal trapping/handling within mpiexec/mpirun. Let me assume that I have 2 cores and I start two different (independent) prog1 and prog2 programs in parallel via the mpirun/mpiexec strartup command: mpiexec -n 1 prog1 : -n 1 prog2