Re: [OMPI users] Detecting Node Failure

2013-06-22 Thread Ralph Castain
I can't speak to ulfm specifics. However, MPI allows you to register error handlers - the callback will come there Sent from my iPhone On Jun 21, 2013, at 11:34 PM, Andreas Schäfer wrote: > On 18:46 Thu 20 Jun , Ralph Castain wrote: >> We will also be supporting that in the developer's tru

Re: [OMPI users] Detecting Node Failure

2013-06-22 Thread Andreas Schäfer
On 18:46 Thu 20 Jun , Ralph Castain wrote: > We will also be supporting that in the developer's trunk fairly soon, and > that will appear later on in the 1.9 series. Will the interface be the same as in ULFM? Could you ping me once the code hits the trunk? I'd to see if we can integrate the cl

Re: [OMPI users] Detecting Node Failure

2013-06-20 Thread Ralph Castain
We will also be supporting that in the developer's trunk fairly soon, and that will appear later on in the 1.9 series. On Thu, Jun 20, 2013 at 4:18 PM, Jeff Squyres (jsquyres) wrote: > Not at present, no. > > But you might want to look at a fork of the OMPI code base that was > exploring fault

Re: [OMPI users] Detecting Node Failure

2013-06-20 Thread Jeff Squyres (jsquyres)
Not at present, no. But you might want to look at a fork of the OMPI code base that was exploring fault resilience issues: http://fault-tolerance.org/ On Jun 20, 2013, at 5:57 PM, Andreas Schäfer wrote: > On 14:59 Thu 20 Jun , Ralph Castain wrote: >> It should detect and abort - wha

Re: [OMPI users] Detecting Node Failure

2013-06-20 Thread Andreas Schäfer
On 14:59 Thu 20 Jun , Ralph Castain wrote: > It should detect and abort - what version are you using? Would it be possible to call MPI_Comm_disconnect() in the case the communicator in question is an intercom -- without having OMPI abort? I'm asking because if we had a possibility to dynamica

Re: [OMPI users] Detecting Node Failure

2013-06-20 Thread Ralph Castain
hursday, June 20, 2013 1:59 PM > Subject: Re: [OMPI users] Detecting Node Failure > > It should detect and abort - what version are you using? > > Sent from my iPhone > > On Jun 20, 2013, at 2:02 PM, Claire Williams > wrote: > >> Hi all, >> >> I

Re: [OMPI users] Detecting Node Failure

2013-06-20 Thread Claire Williams
Hi Ralph, I'm using 1.4.3. Thanks - Claire From: Ralph Castain To: Claire Williams ; Open MPI Users Sent: Thursday, June 20, 2013 1:59 PM Subject: Re: [OMPI users] Detecting Node Failure It should detect and abort - what version are you using?

Re: [OMPI users] Detecting Node Failure

2013-06-20 Thread Ralph Castain
It should detect and abort - what version are you using? Sent from my iPhone On Jun 20, 2013, at 2:02 PM, Claire Williams wrote: > Hi all, > > I was wondering if Open-MPI had any way to detect that a node has crashed, > rebooted, etc. I am currently trying to integrate my MPI application wit