Hi all,

I'm using MPI_Comm_spawn to start new child process.
I found that if the parent process execute MPI_Finalize before the child
process, the child process core dump on MPI_Finalize.

I couldn't find the correct way to have a clean shutdown of all processes
( parent and child ).
What that I found is that sleep(2) in the parent process just before
calling MPI_Finalize, gives the CPU cycle for the child process to finish
its own MPI_Finalize, and only then there is no core dump.

Although this resolve the issue, I can't accept this as acceptable solution.

I guess I'm doing something wrong ( implementation or design ), so this is
why I'm sending this email to the group ( and yes, I did check the FAQ,
and done some search on the distribution list archive ).

Here is the entire code to reproduce the issue :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <mpi.h>
#include <stdlib.h>

int main(int argc, char* argv[]){
        int  my_rank; /* rank of process */
        int  p;       /* number of processes */
        int source;   /* rank of sender */
        int dest;     /* rank of receiver */
        int tag=0;    /* tag for messages */
        char message[100];        /* storage for message */
        MPI_Status status ;   /* return status for receive */

        /* start up MPI */

        MPI_Init(&argc, &argv);

        /* find out process rank */
        MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
        fprintf(stderr,"My rank is : %d\n",my_rank);
        /* find out number of processes */
        MPI_Comm_size(MPI_COMM_WORLD, &p);

        MPI_Comm parent;
        MPI_Comm_get_parent(&parent);

        if ( parent != MPI_COMM_NULL){
                /* create message */
                dest = 0;
                /* use strlen+1 so that '\0' get transmitted */

                MPI_Recv(message, 100, MPI_CHAR, 0, tag,parent, &status);
                fprintf(stderr,"Got [%s] from root\n",message);
                /* shut down MPI */
                MPI_Finalize();

        }
        else{
                printf("Hello MPI World From process 0: Num processes: %d\n",p);
                MPI_Comm everyone;
                MPI_Comm_spawn("mpitest",MPI_ARGV_NULL,1,MPI_INFO_NULL,0,       
MPI_COMM_SELF,&everyone,
MPI_ERRCODES_IGNORE);
                /* find out number of processes */
                MPI_Comm_size(everyone, &p);
                fprintf(stderr,"New world size:%d\n",p);
                for (source = 0; source < p; source++) {
                        sprintf(message, "Hello MPI World from root to process 
%d!", source);
                        MPI_Send(message, strlen(message)+1, MPI_CHAR,source, 
tag, everyone);
                }

                /**
                 * Why this sleep resolve my core dump issues ?
                 * Any timing between the parent to child process ?
                 * Based on the document, and FAQ, I couldn't not find an 
answer for
this issue.
                 *
                 * If you comment out the sleep(2), the child process will 
crash on the
MPI_Finalize with
                 * singal 11, Segmentation fault.
                 */
                //sleep(2); //un-comment this line to have the sleep, and avoid 
the core
dumps.

                /* shut down MPI */
                MPI_Finalize();

        }
        return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Anyone for the rescue ?


Thank you,
Roy Avidor

Reply via email to