Re: [OMPI users] Don't crash on node failures

2010-04-14 Thread Ralph Castain
Yes - followed a few microseconds later with a SIGKILL if it didn't terminate. The daemon exits shortly thereafter, and if the proc is -still- somehow alive, it kills itself once it sees the daemon is gone. On Apr 14, 2010, at 7:29 AM, Jürgen Kaiser wrote: > What happens exactly when a job or

Re: [OMPI users] Don't crash on node failures

2010-04-14 Thread Jürgen Kaiser
What happens exactly when a job or node crashes? Does orte send a SIGTERM to each process? Best regards, Jürgen Durga Choudhury wrote: > This would be a very welcoming new feature for me as well. My two > thumbs up when it happens. > > Best regards > Durga > > > On Tue, Apr 13, 2010 at 10:28 AM,

Re: [OMPI users] Don't crash on node failures

2010-04-13 Thread Durga Choudhury
This would be a very welcoming new feature for me as well. My two thumbs up when it happens. Best regards Durga On Tue, Apr 13, 2010 at 10:28 AM, Ralph Castain wrote: > Not right now, but coming later this year... > > On Apr 13, 2010, at 7:21 AM, Jürgen Kaiser wrote: > >> Hi, >> >> Can I force

Re: [OMPI users] Don't crash on node failures

2010-04-13 Thread Ralph Castain
Not right now, but coming later this year... On Apr 13, 2010, at 7:21 AM, Jürgen Kaiser wrote: > Hi, > > Can I force MPI to not abort the whole job when a node crashes? I would > like to let the remaining MPI-processes perform some action in that case > and then proceed. > > Thanks, > Jürgen >

[OMPI users] Don't crash on node failures

2010-04-13 Thread Jürgen Kaiser
Hi, Can I force MPI to not abort the whole job when a node crashes? I would like to let the remaining MPI-processes perform some action in that case and then proceed. Thanks, Jürgen