Hi, I don't know if it is my sample code or if it is a problem whit MPI_Scatter() on inter-communicator (maybe similar to the problem we found with MPI_Allgather() on inter-communicator a few weeks ago) but a simple program I wrote freeze during its second iteration of a loop doing an MPI_Scatter() over an inter-communicator.
For example if I compile as follows: mpicc -Wall scatter_bug.c -o scatter_bug I get no error or warning. Then if a start it with np=2 as follows: mpiexec -n 2 ./scatter_bug it prints: beginning Scatter i_root_group=0 ending Scatter i_root_group=0 beginning Scatter i_root_group=1 and then hang... Note also that if I change the for loop to execute only the MPI_Scatter() of the second iteration (e.g. replacing "i_root_group=0;" by "i_root_group=1;"), it prints: beginning Scatter i_root_group=1 and then hang... The problem therefore seems to be related with the second iteration itself. Please note that this program run fine with mpich2 1.0.7rc2 (ch3:sock device) for many different number of process (np) when the executable is ran with or without valgrind. The OpenMPI version I use is 1.2.6rc3 and was configured as follows: ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi-f77 --disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions --with-io-romio-flags=--with-file-system=ufs+nfs Note also that all process (when using OpenMPI or mpich2) were started on the same machine. Also if you look at source code, you will notice that some arguments to MPI_Scatter() are NULL or 0. This may look strange and problematic when using a normal intra-communicator. However according to the book "MPI - The complete reference" vol 2 about MPI-2, for MPI_Scatter() with an inter-communicator: "The sendbuf, sendcount and sendtype arguments are significant only at the root process. The recvbuf, recvcount, and recvtype arguments are significant only at the processes of the leaf group." If anyone else can have a look at this program and try it it would be helpful. Thanks, Martin #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char **argv) { int ret_code = 0; int comm_size, comm_rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &comm_size); MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank); if (comm_size > 1) { MPI_Comm subcomm, intercomm; const int group_id = comm_rank % 2; int i_root_group; /* split process in two groups: even and odd comm_ranks. */ MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, &subcomm); /* The remote leader comm_rank for even and odd groups are respectively: 1 and 0 */ MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id, 0, &intercomm); /* for i_root_group==0 process with comm_rank==0 scatter data to all process with odd comm_rank */ /* for i_root_group==1 process with comm_rank==1 scatter data to all process with even comm_rank */ for (i_root_group=0; i_root_group < 2; i_root_group++) { if (comm_rank == 0) { printf("beginning Scatter i_root_group=%d\n",i_root_group); } if (group_id == i_root_group) { const int is_root = (comm_rank == i_root_group); int *send_buf = NULL; if (is_root) { const int nbr_other = (comm_size+i_root_group)/2; int ii; send_buf = malloc(nbr_other*sizeof(*send_buf)); for (ii=0; ii < nbr_other; ii++) { send_buf[ii] = ii; } } MPI_Scatter(send_buf, 1, MPI_INT, NULL, 0, MPI_INT, (is_root ? MPI_ROOT : MPI_PROC_NULL), intercomm); if (is_root) { free(send_buf); } } else { int an_int; MPI_Scatter(NULL, 0, MPI_INT, &an_int, 1, MPI_INT, 0, intercomm); } if (comm_rank == 0) { printf("ending Scatter i_root_group=%d\n",i_root_group); } } MPI_Comm_free(&intercomm); MPI_Comm_free(&subcomm); } else { fprintf(stderr, "%s: error this program must be started np > 1\n", argv[0]); ret_code = 1; } MPI_Finalize(); return ret_code; }