Hi All,

I was looking for posts about fault tolerant in MPI and I found the post
below:

http://www.open-mpi.org/community/lists/users/2012/06/19658.php

I am trying to understand  all work about failures detection present in
open-mpi. So, I began with a simple application, a ring application
(ring.c) , to understand errors handlers. But, it seems me that didn't
work, why not? (the code is below)

The application (the process) was running in the same machine with the
following code line:

$ mpiexec -n 4 ring

While the  ring application was running, one of the process was killed.
So, the entire application stopped (ok until here), but didn't show me the
error message. The line if(error != MPI_SUCCESS) should not worked?

I am using the mpiexec (OpenRTE) 1.6.5.

Thanks in advance,

Edson

-----------------------------------------------
#include <stdio.h>
#include <mpi.h>
#include <time.h>

int main( int argc, char *argv[] )
{
    int rank, size;
    int n = 0;
    int tag = 0;
    int error;
    int root = 0;
    int next, previous;
    double start = 0;
    double finish = 0;

    MPI_Status status;

    MPI_Init( &argc, &argv );
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    // error handler
    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);

    do {
        next = (rank + 1) % (size);
        n++;

        if(rank != 0){
            previous = (rank - 1);
        }else{
            previous = size - 1;
        }

        if (rank == root) {

            error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }

            error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
MPI_COMM_WORLD, &status );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }
        }
        else {

            error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
MPI_COMM_WORLD, &status );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }

            error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }
        }
        printf( "Process %d got %d\n", rank, n );

        // wait a bit
        start = MPI_Wtime();
        finish = start;

        while ( (finish - start) < 1 ){
            finish =  MPI_Wtime();
        }

    } while (n < 100);

    MPI_Finalize();
    return 0;
}
----------------------------




Reply via email to