I have a partial answer to your question, I am however only aware of the MPI layer of the implementation of dynamic process management, not of the runtime environment.

There was an info object called very similar to "spawn_sched_round_robin" in LAM/MPI. However, this is not one of the "predefined" info objects of MPI-2 for dynamic process management, but an extension of LAM/MPI (which is fine, since the MPI spec. explicitly allows MPI libraries to have their own info objects). There is right now no corresponding object in Open MPI. (In fact, the only info object currently recognized by Open MPI in this section is the "wdir" option). However, if you would not spawn every process separatly, but e.g. four processes at once, these four processes would be scheduled according to the hostfile, so you would not end up having all processes on the first host.

Hope this helps.

Edgar

Zhao, Yongsheng wrote:


Hello,

I am trying to use the dynamic process property of the open-mpi, but met some problems.

In my program, the master program spawns some worker programs, the number of workers depends on the universe_size. Now the problem is that the worker programs can only be spawned on one node, the same node where the master program is on. I specified the nodes using the hostfiles. Here is the content of my hostfile:
n18 slots=1
n17 slots=4

The master is running on the n18, and I hope it can spawn 4 workers on the n17. The command I started the program is:
mpirun --hostfile hostfile -np 1 master ...

Howerver, all the 4 workers are spawned on the n18 too, none of them running on the n17. Here is my code to spawn workers:

// Spawns workers.
void master::Master::spawnWorkers(const char* command, const char* arguments[]) {

  char schema[80];
  int mpi_spawn_error;
  Task * task = tasks.front();
  mpi_spawn_info=MPI::Info::Create();
for(int iworker=1; iworker<=number_of_workers; ++iworker) {
    sprintf(schema, "c%d", iworker);
    mpi_spawn_info.Set("spawn_sched_round_robin", schema);
intercomm_workers[iworker]=MPI::COMM_SELF.Spawn(command, arguments, 1, mpi_spawn_info, mpi_comm_rank, &mpi_spawn_error);
   if (mpi_spawn_error!=MPI::SUCCESS) {
std::cerr << "(Master) Error in spawning worker (rank=" << mpi_comm_rank << ").\n";
      MPI::COMM_WORLD.Abort((1 << 16)+1);
    } else {
      std::cerr << "Master spawned worker (rank=" << iworker << ").\n";
      intracomm_workers[iworker]=intercomm_workers[iworker].Merge(true);
std::cerr << "Master merging inter - and intra - communicators for worker (rank=" << iworker << ").\n";
    }
  }
mpi_spawn_info.Free();

}

In the code the command is just executable file name of the worker.

I guess I didn't set the mpi_spaw_info which is MPI::Info type correctly. But I have no idea how to set it.

Any advice?

Thanks.

Regards
Yongsheng Zhao


------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to