Hi All, I was looking for posts about fault tolerant in MPI and I found the post below:
http://www.open-mpi.org/community/lists/users/2012/06/19658.php I am trying to understand all work about failures detection present in open-mpi. So, I began with a simple application, a ring application (ring.c) , to understand errors handlers. But, it seems me that didn't work, why not? (the code is below) The application (the process) was running in the same machine with the following code line: $ mpiexec -n 4 ring While the ring application was running, one of the process was killed. So, the entire application stopped (ok until here), but didn't show me the error message. The line if(error != MPI_SUCCESS) should not worked? I am using the mpiexec (OpenRTE) 1.6.5. Thanks in advance, Edson ----------------------------------------------- #include <stdio.h> #include <mpi.h> #include <time.h> int main( int argc, char *argv[] ) { int rank, size; int n = 0; int tag = 0; int error; int root = 0; int next, previous; double start = 0; double finish = 0; MPI_Status status; MPI_Init( &argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); // error handler MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN); do { next = (rank + 1) % (size); n++; if(rank != 0){ previous = (rank - 1); }else{ previous = size - 1; } if (rank == root) { error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD ); //if a error happens print the message if(error != MPI_SUCCESS){ printf("error"); } error = MPI_Recv( &n, 1, MPI_INT, previous, tag, MPI_COMM_WORLD, &status ); //if a error happens print the message if(error != MPI_SUCCESS){ printf("error"); } } else { error = MPI_Recv( &n, 1, MPI_INT, previous, tag, MPI_COMM_WORLD, &status ); //if a error happens print the message if(error != MPI_SUCCESS){ printf("error"); } error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD ); //if a error happens print the message if(error != MPI_SUCCESS){ printf("error"); } } printf( "Process %d got %d\n", rank, n ); // wait a bit start = MPI_Wtime(); finish = start; while ( (finish - start) < 1 ){ finish = MPI_Wtime(); } } while (n < 100); MPI_Finalize(); return 0; } ----------------------------