Edson, Based on your questions I would suggest you take a look at the ULFM-enabled version of Open MPI. You can find it at http://fault-tolerance.org/.
George. On Aug 11, 2013, at 15:33 , Edson Tavares de Camargo <etcama...@inf.ufpr.br> wrote: > Thanks a lot for your reply, Ralph! > > Could you tell me in what situation the error handler would be called in > the 1.6.5 version? > > I had thought that a failure in a process would be catched by the error > handler. Kill, or abort, the process wouldn't the same behaviour? > > In the 1.7.4 release if a process was killed the error handler will be > catched? > > Thanks, > > Edson > --------------------- > >> The error handler wouldn't be called in that situation - we simply abort >> the job. We expect to provide that integration in something like the 1.7.4 >> release milestone. >> >> >> On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo >> <etcama...@inf.ufpr.br> wrote: >> >>> Hi All, >>> >>> I was looking for posts about fault tolerant in MPI and I found the post >>> below: >>> >>> http://www.open-mpi.org/community/lists/users/2012/06/19658.php >>> >>> I am trying to understand all work about failures detection present in >>> open-mpi. So, I began with a simple application, a ring application >>> (ring.c) , to understand errors handlers. But, it seems me that didn't >>> work, why not? (the code is below) >>> >>> The application (the process) was running in the same machine with the >>> following code line: >>> >>> $ mpiexec -n 4 ring >>> >>> While the ring application was running, one of the process was killed. >>> So, the entire application stopped (ok until here), but didn't show me >>> the >>> error message. The line if(error != MPI_SUCCESS) should not worked? >>> >>> I am using the mpiexec (OpenRTE) 1.6.5. >>> >>> Thanks in advance, >>> >>> Edson >>> >>> ----------------------------------------------- >>> #include <stdio.h> >>> #include <mpi.h> >>> #include <time.h> >>> >>> int main( int argc, char *argv[] ) >>> { >>> int rank, size; >>> int n = 0; >>> int tag = 0; >>> int error; >>> int root = 0; >>> int next, previous; >>> double start = 0; >>> double finish = 0; >>> >>> MPI_Status status; >>> >>> MPI_Init( &argc, &argv ); >>> MPI_Comm_size(MPI_COMM_WORLD, &size); >>> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>> >>> // error handler >>> MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN); >>> >>> do { >>> next = (rank + 1) % (size); >>> n++; >>> >>> if(rank != 0){ >>> previous = (rank - 1); >>> }else{ >>> previous = size - 1; >>> } >>> >>> if (rank = > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users