Re: [OMPI users] Running on crashing nodes

2010-09-27 Thread Randolph Pullen
program with a known short string, if the monitor does not see this string prefixed on a line, it can terminate MPI, check available nodes and recast the jobĀ  accordingly Hope this helps,Randolph --- On Fri, 24/9/10, Joshua Hursey wrote: From: Joshua Hursey Subject: Re: [OMPI users] Running on

Re: [OMPI users] Running on crashing nodes

2010-09-24 Thread Joshua Hursey
As one of the Open MPI developers actively working on the MPI layer stabilization/recover feature set, I don't think we can give you a specific timeframe for availability, especially availability in a stable release. Once the initial functionality is finished, we will open it up for user testing

Re: [OMPI users] Running on crashing nodes

2010-09-24 Thread Andrei Fokau
Ralph, could you tell us when this functionality will be available in the stable version? A rough estimate will be fine. On Fri, Sep 24, 2010 at 01:24, Ralph Castain wrote: > In a word, no. If a node crashes, OMPI will abort the currently-running job > if it had processes on that node. There is

Re: [OMPI users] Running on crashing nodes

2010-09-23 Thread Ralph Castain
In a word, no. If a node crashes, OMPI will abort the currently-running job if it had processes on that node. There is no current ability to "ride-thru" such an event. That said, there is work being done to support "ride-thru". Most of that is in the current developer's code trunk, and more is com

[OMPI users] Running on crashing nodes

2010-09-23 Thread Andrei Fokau
Dear users, Our cluster has a number of nodes which have high probability to crash, so it happens quite often that calculations stop due to one node getting down. May be you know if it is possible to block the crashed nodes during run-time when running with OpenMPI? I am asking about principal pos