The MPI standard state that the correct way to abort/kill an MPI application is using the MPI_Abort function. Except, if you're doing some kind of fault tolerance stuff, there is no reason to end one of your MPI processes via exit.

  Thanks,
    george.

On Aug 16, 2007, at 12:04 PM, Daniel Spångberg wrote:

Dear Open-MPI user list members,

I am currently having a user with an application where one of the
MPI-processes die, but the openmpi-system does not kill the rest of the
application.

Since the mpirun man page states the following I would expect it to take
care of killing the application if a process exits without calling
MPI_Finalize:

    Process Termination / Signal Handling
During the run of an MPI application, if any rank dies abnormally
(either exiting before invoking MPI_FINALIZE, or dying as the
result of a signal), mpirun will print out an error message and
kill the rest of the MPI application.

The following test program demonstrates the behaviour (program hangs until
it is killed by the user or batch system):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <mpi.h>

#define RANK_DEATH 1

int main(int argc, char **argv)
{
   int rank;
   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD,&rank);

   sleep(10);
   if (rank==RANK_DEATH)
     exit(1);
   sleep(10);
   MPI_Finalize();
   return 0;
}

I have tested this on openmpi 1.2.1 as well as the latest stable 1.2.3. I
am on Linux x86_64.

Is this a bug, or are there some flags I can use to force the mpirun (or
orted, or...) to kill the whole MPI program when this happens?

If one of the application processes die from a signal (I have tested SEGV and FPE) rather than just exiting the whole application is indeed killed.

Best regards
Daniel Spångberg
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to