Hi Ralph,

I'm using 1.4.3. Thanks

- Claire


________________________________
 From: Ralph Castain <rhc.open...@gmail.com>
To: Claire Williams <clairewilliams1...@yahoo.com>; Open MPI Users 
<us...@open-mpi.org> 
Sent: Thursday, June 20, 2013 1:59 PM
Subject: Re: [OMPI users] Detecting Node Failure
 


It should detect and abort - what version are you using?

Sent from my iPhone

On Jun 20, 2013, at 2:02 PM, Claire Williams <clairewilliams1...@yahoo.com> 
wrote:


Hi all,
>
>
>I was wondering if Open-MPI had any way to detect that a node has crashed, 
>rebooted, etc. I am currently trying to integrate my MPI application with 
>Amazon EC2 spot instances, and since spot instances can be terminated at any 
>time, I would like to try to make it so that my application can detect this 
>node failure, maybe remove the node from the machine file, and restart the 
>application automatically. Right now, when one of the worker nodes is rebooted 
>or terminated, the master that is waiting on the results of that node will 
>just hang, waiting for results that will never come. 
>
>
>Thanks,
>
>
>Claire  
_______________________________________________
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to