Hi Ralph, I'm using 1.4.3. Thanks
- Claire ________________________________ From: Ralph Castain <rhc.open...@gmail.com> To: Claire Williams <clairewilliams1...@yahoo.com>; Open MPI Users <us...@open-mpi.org> Sent: Thursday, June 20, 2013 1:59 PM Subject: Re: [OMPI users] Detecting Node Failure It should detect and abort - what version are you using? Sent from my iPhone On Jun 20, 2013, at 2:02 PM, Claire Williams <clairewilliams1...@yahoo.com> wrote: Hi all, > > >I was wondering if Open-MPI had any way to detect that a node has crashed, >rebooted, etc. I am currently trying to integrate my MPI application with >Amazon EC2 spot instances, and since spot instances can be terminated at any >time, I would like to try to make it so that my application can detect this >node failure, maybe remove the node from the machine file, and restart the >application automatically. Right now, when one of the worker nodes is rebooted >or terminated, the master that is waiting on the results of that node will >just hang, waiting for results that will never come. > > >Thanks, > > >Claire _______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users