I am attempting to split my application into multiple master+workers
groups using MPI_COMM_split. My MPI revision is shown as:

mpirun --tag-output ompi_info -v ompi full --parsable
[1,0]<stdout>:package:Open MPI root@build-x86-64 Distribution
[1,0]<stdout>:ompi:version:full:1.4.3
[1,0]<stdout>:ompi:version:svn:r23834
[1,0]<stdout>:ompi:version:release_date:Oct 05, 2010
[1,0]<stdout>:orte:version:full:1.4.3
[1,0]<stdout>:orte:version:svn:r23834
[1,0]<stdout>:orte:version:release_date:Oct 05, 2010
[1,0]<stdout>:opal:version:full:1.4.3
[1,0]<stdout>:opal:version:svn:r23834
[1,0]<stdout>:opal:version:release_date:Oct 05, 2010
[1,0]<stdout>:ident:1.4.3

The basic problem I am having is that none of processor instances ever
returns from the MPI_COMM_split call. I am pretty new to MPI and it is
likely I am not doing things quite correctly. I'd appreciate some guidance.

I am working with an application that has functioned nicely for a while
now. It only uses a single MPI_COMM_WORLD communicator. It is standard
stuff:  a master that hands out tasks to many workers, receives output
and keeps track of workers that are ready to receive another task. The
tasks are quite compute-intensive. When running a variation of the
process that uses Monte Carlo iterations, jobs can exceed the 30 hours
they are limited to. The MC iterations are independent of each other -
adding random noise to an input - so I would like to run multiple
iterations simultaneously so that 4 times the cores runs in a fourth of
the time. This would entail a supervisor interacting with multiple
master+workers groups.

I had thought that I would just have to declare a communicator for each
group so that broadcasts and syncs would work within a single group.

   MPI_Comm_size( MPI_COMM_WORLD, &total_proc_count );
   MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
   ...
   cores_per_group = total_proc_count / groups_count;
   my_group = my_rank / cores_per_group;     // e.g., 0, 1, ...
   group_rank = my_rank - my_group * cores_per_group;  // rank within a
group
   if ( my_rank == 0 )    continue;    // Do not create group for supervisor
   MPI_Comm oldcomm = MPI_COMM_WORLD;
   MPI_Comm my_communicator;    // Actually declared as a class variable
   int sstat = MPI_Comm_split( oldcomm, my_group, group_rank,
         &my_communicator );

There is never a return from the above _split() call. Do I need to do
something else to set this up? I would have expected perhaps a non-zero
status return, but not that I would get no return at all. I would
appreciate any comments or guidance.

- Gary

Reply via email to