I don't think that anybody answered to your email so far, I'll have a look at it on thursday...

Thanks
Edgar

Audet, Martin wrote:
Hi,

I don't know if it is my sample code or if it is a problem whit MPI_Scatter() 
on inter-communicator (maybe similar to the problem we found with 
MPI_Allgather() on inter-communicator a few weeks ago) but a simple program I 
wrote freeze during its second iteration of a loop doing an MPI_Scatter() over 
an inter-communicator.

For example if I compile as follows:

  mpicc -Wall scatter_bug.c -o scatter_bug

I get no error or warning. Then if a start it with np=2 as follows:

    mpiexec -n 2 ./scatter_bug

it prints:

   beginning Scatter i_root_group=0
   ending Scatter i_root_group=0
   beginning Scatter i_root_group=1

and then hang...

Note also that if I change the for loop to execute only the MPI_Scatter() of the second iteration 
(e.g. replacing "i_root_group=0;" by "i_root_group=1;"), it prints:

    beginning Scatter i_root_group=1

and then hang...

The problem therefore seems to be related with the second iteration itself.

Please note that this program run fine with mpich2 1.0.7rc2 (ch3:sock device) 
for many different number of process (np) when the executable is ran with or 
without valgrind.

The OpenMPI version I use is 1.2.6rc3 and was configured as follows:

   ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi-f77 
--disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions 
--with-io-romio-flags=--with-file-system=ufs+nfs

Note also that all process (when using OpenMPI or mpich2) were started on the 
same machine.

Also if you look at source code, you will notice that some arguments to MPI_Scatter() are 
NULL or 0. This may look strange and problematic when using a normal intra-communicator. 
However according to the book "MPI - The complete reference" vol 2 about MPI-2, 
for MPI_Scatter() with an inter-communicator:

  "The sendbuf, sendcount and sendtype arguments are significant only at the root 
process. The recvbuf, recvcount, and recvtype arguments are significant only at the 
processes of the leaf group."

If anyone else can have a look at this program and try it it would be helpful.

Thanks,

Martin


#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv)
{
   int ret_code = 0;
   int comm_size, comm_rank;

   MPI_Init(&argc, &argv);

   MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

   if (comm_size > 1) {
      MPI_Comm subcomm, intercomm;
      const int group_id = comm_rank % 2;
      int i_root_group;

      /* split process in two groups:  even and odd comm_ranks. */
      MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, &subcomm);

      /* The remote leader comm_rank for even and odd groups are respectively: 
1 and 0 */
      MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id, 0, 
&intercomm);

      /* for i_root_group==0 process with comm_rank==0 scatter data to all 
process with odd  comm_rank */
      /* for i_root_group==1 process with comm_rank==1 scatter data to all 
process with even comm_rank */
      for (i_root_group=0; i_root_group < 2; i_root_group++) {
         if (comm_rank == 0) {
            printf("beginning Scatter i_root_group=%d\n",i_root_group);
         }
         if (group_id == i_root_group) {
            const int  is_root  = (comm_rank == i_root_group);
            int       *send_buf = NULL;
            if (is_root) {
               const int nbr_other = (comm_size+i_root_group)/2;
               int       ii;
               send_buf = malloc(nbr_other*sizeof(*send_buf));
               for (ii=0; ii < nbr_other; ii++) {
                   send_buf[ii] = ii;
               }
            }
            MPI_Scatter(send_buf, 1, MPI_INT,
                        NULL,     0, MPI_INT, (is_root ? MPI_ROOT : 
MPI_PROC_NULL), intercomm);

            if (is_root) {
               free(send_buf);
            }
         }
         else {
            int an_int;
            MPI_Scatter(NULL,    0, MPI_INT,
                        &an_int, 1, MPI_INT, 0, intercomm);
         }
         if (comm_rank == 0) {
            printf("ending Scatter i_root_group=%d\n",i_root_group);
         }
      }

      MPI_Comm_free(&intercomm);
      MPI_Comm_free(&subcomm);
   }
   else {
      fprintf(stderr, "%s: error this program must be started np > 1\n", 
argv[0]);
      ret_code = 1;
   }

   MPI_Finalize();

   return ret_code;
}

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Reply via email to