Re: [OMPI users] Spawn and distribution of slaves

Jean Latour Fri, 3 Mar 2006 03:32:56 -0500

Thanks for your answer. Your example address one possible situationwhere a parallelapplication is spawned by a driver with MPI_Comm_Spawn, or multipleparallel applicationsare spawned at the same time with a MPI_Comm_Span_Multiple, over a setof processorsdescribed in the machinefile. It is OK if the next spawn occurs aftersome processes at the

beginning of the machinefile have stopped.

However I have in hands another case where the spawn processes arereally dynamic overtime. Any child processes can stop (not necessarily the first in themachinefile), and thusthey are freeing some processors on which the new spawned processes mustbe running.With LAM_MPI this situation has a satisfactory solution with the INFOparameter of theMPI_Comm_Spawn. It allows to specify a "local" machinefile for thesespawned processes,instead of taking always the same machinefile from the beginning as inyour example.

Do you know if this specific feature will be implemented in Open-MPI (Ihope it will be),

and possibly when ?
Dynamic applications really need this.

Best Regards,
Jean Latour

Edgar Gabriel wrote:

so for my tests, Open MPI did follow the machinefile (see output)
further below, however, for each spawn operation it starts from the very
beginning of the machinefile...

The following example spawns 5 child processes (with a single
MPI_Comm_spawn), and each child prints its rank and the hostname.

gabriel@linux12 ~/dyncomm $ mpirun -hostfile machinefile  -np 3
./dyncomm_spawn_father
 Checking for MPI_Comm_spawn.....................working
Hello world from child 0 on host linux12
Hello world from child 1 on host linux13
Hello world from child 3 on host linux15
Hello world from child 4 on host linux16
     Testing Send/Recv on the intercomm..........working
Hello world from child 2 on host linux14


with the machinefile being:
gabriel@linux12 ~/dyncomm $ cat machinefile
linux12
linux13
linux14
linux15
linux16
In your code, you always spawn 1 process at the time, and that's whythey are all located on the same node.
Hope this helps...
Edgar


Edgar Gabriel wrote:
as far as I know, Open MPI should follow the machinefile for spawnoperations, starting however for every spawn at the beginning of themachinefile again. An info object such as 'lam_sched_round_robin' iscurrently not available/implemented. Let me look into this...
Jean Latour wrote:
Hello,
Testing the MPI_Comm_Spawn function of Open MPI version 1.0.1, I have anexample that works OK,except that it shows that the spawned processes do not follow the"machinefile" setting of processors.In this example a master process spawns first 2 processes, thendisconnects from them and spawn 2 moreprocesses. Running on a Quad Opteron node, all processes are running onthe same node, although the
machinefile specifies that the slaves should run on different nodes.
With the actual version of OpenMPI is it possible to direct the spawnedprocesses ona specific node ? (the node distribution could be given in the"machinefile" file, as with LAM MPI)
The code (Fortran 90) of this example and makefile is attached as a tarfile.
        
Thank you very much

Jean Latour


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

<<attachment: latour.vcf>>

Re: [OMPI users] Spawn and distribution of slaves

Reply via email to