Thanks a lot for your reply, Ralph! Could you tell me in what situation the error handler would be called in the 1.6.5 version?
I had thought that a failure in a process would be catched by the error handler. Kill, or abort, the process wouldn't the same behaviour? In the 1.7.4 release if a process was killed the error handler will be catched? Thanks, Edson --------------------- > The error handler wouldn't be called in that situation - we simply abort > the job. We expect to provide that integration in something like the 1.7.4 > release milestone. > > > On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo > <etcama...@inf.ufpr.br> wrote: > >> Hi All, >> >> I was looking for posts about fault tolerant in MPI and I found the post >> below: >> >> http://www.open-mpi.org/community/lists/users/2012/06/19658.php >> >> I am trying to understand all work about failures detection present in >> open-mpi. So, I began with a simple application, a ring application >> (ring.c) , to understand errors handlers. But, it seems me that didn't >> work, why not? (the code is below) >> >> The application (the process) was running in the same machine with the >> following code line: >> >> $ mpiexec -n 4 ring >> >> While the ring application was running, one of the process was killed. >> So, the entire application stopped (ok until here), but didn't show me >> the >> error message. The line if(error != MPI_SUCCESS) should not worked? >> >> I am using the mpiexec (OpenRTE) 1.6.5. >> >> Thanks in advance, >> >> Edson >> >> ----------------------------------------------- >> #include <stdio.h> >> #include <mpi.h> >> #include <time.h> >> >> int main( int argc, char *argv[] ) >> { >> int rank, size; >> int n = 0; >> int tag = 0; >> int error; >> int root = 0; >> int next, previous; >> double start = 0; >> double finish = 0; >> >> MPI_Status status; >> >> MPI_Init( &argc, &argv ); >> MPI_Comm_size(MPI_COMM_WORLD, &size); >> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> // error handler >> MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN); >> >> do { >> next = (rank + 1) % (size); >> n++; >> >> if(rank != 0){ >> previous = (rank - 1); >> }else{ >> previous = size - 1; >> } >> >> if (rank =