program
with a known short string, if the monitor does not see this string prefixed on
a line, it can terminate MPI, check available nodes and recast the jobĀ
accordingly
Hope this helps,Randolph
--- On Fri, 24/9/10, Joshua Hursey wrote:
From: Joshua Hursey
Subject: Re: [OMPI users] Running on
As one of the Open MPI developers actively working on the MPI layer
stabilization/recover feature set, I don't think we can give you a specific
timeframe for availability, especially availability in a stable release. Once
the initial functionality is finished, we will open it up for user testing
Ralph, could you tell us when this functionality will be available in the
stable version? A rough estimate will be fine.
On Fri, Sep 24, 2010 at 01:24, Ralph Castain wrote:
> In a word, no. If a node crashes, OMPI will abort the currently-running job
> if it had processes on that node. There is
In a word, no. If a node crashes, OMPI will abort the currently-running job
if it had processes on that node. There is no current ability to "ride-thru"
such an event.
That said, there is work being done to support "ride-thru". Most of that is
in the current developer's code trunk, and more is com
Dear users,
Our cluster has a number of nodes which have high probability to crash, so
it happens quite often that calculations stop due to one node getting down.
May be you know if it is possible to block the crashed nodes during run-time
when running with OpenMPI? I am asking about principal pos