Edson,

Based on your questions I would suggest you take a look at the ULFM-enabled 
version of Open MPI. You can find it at http://fault-tolerance.org/.

George.


On Aug 11, 2013, at 15:33 , Edson Tavares de Camargo <etcama...@inf.ufpr.br> 
wrote:

> Thanks a lot for your reply, Ralph!
> 
> Could you tell me in what situation the error handler would be called in
> the 1.6.5 version?
> 
> I had thought that a failure in a process would be catched by the error
> handler. Kill, or abort, the process wouldn't the same behaviour?
> 
> In the 1.7.4 release if a process was killed the error handler will be
> catched?
> 
> Thanks,
> 
> Edson
> ---------------------
> 
>> The error handler wouldn't be called in that situation - we simply abort
>> the job. We expect to provide that integration in something like the 1.7.4
>> release milestone.
>> 
>> 
>> On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo
>> <etcama...@inf.ufpr.br> wrote:
>> 
>>> Hi All,
>>> 
>>> I was looking for posts about fault tolerant in MPI and I found the post
>>> below:
>>> 
>>> http://www.open-mpi.org/community/lists/users/2012/06/19658.php
>>> 
>>> I am trying to understand  all work about failures detection present in
>>> open-mpi. So, I began with a simple application, a ring application
>>> (ring.c) , to understand errors handlers. But, it seems me that didn't
>>> work, why not? (the code is below)
>>> 
>>> The application (the process) was running in the same machine with the
>>> following code line:
>>> 
>>> $ mpiexec -n 4 ring
>>> 
>>> While the  ring application was running, one of the process was killed.
>>> So, the entire application stopped (ok until here), but didn't show me
>>> the
>>> error message. The line if(error != MPI_SUCCESS) should not worked?
>>> 
>>> I am using the mpiexec (OpenRTE) 1.6.5.
>>> 
>>> Thanks in advance,
>>> 
>>> Edson
>>> 
>>> -----------------------------------------------
>>> #include <stdio.h>
>>> #include <mpi.h>
>>> #include <time.h>
>>> 
>>> int main( int argc, char *argv[] )
>>> {
>>>   int rank, size;
>>>   int n = 0;
>>>   int tag = 0;
>>>   int error;
>>>   int root = 0;
>>>   int next, previous;
>>>   double start = 0;
>>>   double finish = 0;
>>> 
>>>   MPI_Status status;
>>> 
>>>   MPI_Init( &argc, &argv );
>>>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>> 
>>>   // error handler
>>>   MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>> 
>>>   do {
>>>       next = (rank + 1) % (size);
>>>       n++;
>>> 
>>>       if(rank != 0){
>>>           previous = (rank - 1);
>>>       }else{
>>>           previous = size - 1;
>>>       }
>>> 
>>>       if (rank =
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to