Hey all!

Last week I observed a strange behaviour of Open MPI when using
MPI_Comm_spawn() to create new MPI processes: The child processes are
started but after the childs call to MPI_Init() no output to stdout gets
redirected to the stdout of the parent/mpirun process. Before the call
to MPI_Init() the childs stdout is redirected correctly.

I tried this with several MPI versions on different architectures (1.2.7
on Debian i686, 1.2.2 on SuSe 10.3x86_64) and wrote some dummy code to
demonstrate the behaviour:


/* parent.c */
#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv) {

        MPI_Init(&argc, &argv);

        printf("[parent] now spawn\n");

        MPI_Comm everyone;
        MPI_Comm_spawn("./child", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &everyone, MPI_ERRCODES_IGNORE);

        printf("[parent] finished spawning\n");

        //see child.c
        while (1);

        MPI_Finalize();

        return 0;
}


/* child.c */
#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv) {

        MPI_Init(&argc, &argv);

        /* stdout does not get redirected!
         * (even sometimes (!) without the while (1); loop
         * in parent.c)
         */
        printf("[child] initialized MPI\n");

        MPI_Finalize();

        return 0;
}

Output is:
% mpicc -o parent parent.c && mpicc -o child child.c && mpirun ./parent
[parent] now spawn
[parent] finished spawning

Without the while(1); loop in parent.c the output sometimes (!) remains
the same as above and sometimes is:
% mpicc -o parent parent.c && mpicc -o child child.c && mpirun ./parent
[parent] now spawn
[parent] finished spawning
[child] initialized MPI

The child process definitely runs past the MPI_Init() call in every
situation described here, so I think the problem has to be the stdout
redirection.

A similar (or the same?) bug is reported here:
https://svn.open-mpi.org/trac/ompi/ticket/1120 . And as rhc states in
the comment it is not working on remote nodes either. I don't know which
release should have fixed the bug and that's why I can't say if it's a
known or a new problem. Perhaps someone of the developers could take a
look at it.


Thanks!!

bye,
André Gaul

Reply via email to