Victor you might want to take a look at the Open MPI version available from
http://fault-tolerance.org/. It provides additional features to graciously
handle node failures.
George.
On May 30, 2013, at 17:55 , Victor Vysotskiy
wrote:
> Hi Ralph,
>
>> -mca orte_abort_non_zero_exit 0
>
>
On May 30, 2013, at 8:55 AM, Victor Vysotskiy
wrote:
> Hi Ralph,
>
>> -mca orte_abort_non_zero_exit 0
>
> Thank you for the hint. That it is exactly what I need! BTW, does it help if
> one of the working node occasionally dies during the MPMD run?
I'm afraid not - failure of a node is a te
Hi Ralph,
> -mca orte_abort_non_zero_exit 0
Thank you for the hint. That it is exactly what I need! BTW, does it help if
one of the working node occasionally dies during the MPMD run?
With best regards,
Victor.
There is such an option in the 1.7 series and on the trunk, but I don't see it
in v1.6.
-mca orte_abort_non_zero_exit 0
On May 30, 2013, at 3:40 AM, Victor Vysotskiy
wrote:
> Dear OpenMPI Developers and Users,
>
> I have general question on signal trapping/handling within mpiexec/mpirun.
Dear OpenMPI Developers and Users,
I have general question on signal trapping/handling within mpiexec/mpirun. Let
me assume that I have 2 cores and I start two different (independent) prog1 and
prog2 programs in parallel via the mpirun/mpiexec strartup command:
mpiexec -n 1 prog1 : -n 1 prog2