beginning of the machinefile have stopped.However I have in hands another case where the spawn processes are really dynamic over time. Any child processes can stop (not necessarily the first in the machinefile), and thus they are freeing some processors on which the new spawned processes must be running. With LAM_MPI this situation has a satisfactory solution with the INFO parameter of the MPI_Comm_Spawn. It allows to specify a "local" machinefile for these spawned processes, instead of taking always the same machinefile from the beginning as in your example.
Do you know if this specific feature will be implemented in Open-MPI (I hope it will be),
and possibly when ? Dynamic applications really need this. Best Regards, Jean Latour Edgar Gabriel wrote:
so for my tests, Open MPI did follow the machinefile (see output) further below, however, for each spawn operation it starts from the very beginning of the machinefile... The following example spawns 5 child processes (with a single MPI_Comm_spawn), and each child prints its rank and the hostname. gabriel@linux12 ~/dyncomm $ mpirun -hostfile machinefile -np 3 ./dyncomm_spawn_father Checking for MPI_Comm_spawn.....................working Hello world from child 0 on host linux12 Hello world from child 1 on host linux13 Hello world from child 3 on host linux15 Hello world from child 4 on host linux16 Testing Send/Recv on the intercomm..........working Hello world from child 2 on host linux14 with the machinefile being: gabriel@linux12 ~/dyncomm $ cat machinefile linux12 linux13 linux14 linux15 linux16In your code, you always spawn 1 process at the time, and that's why they are all located on the same node.Hope this helps... Edgar Edgar Gabriel wrote:as far as I know, Open MPI should follow the machinefile for spawn operations, starting however for every spawn at the beginning of the machinefile again. An info object such as 'lam_sched_round_robin' is currently not available/implemented. Let me look into this...Jean Latour wrote:Hello,Testing the MPI_Comm_Spawn function of Open MPI version 1.0.1, I have an example that works OK, except that it shows that the spawned processes do not follow the "machinefile" setting of processors. In this example a master process spawns first 2 processes, then disconnects from them and spawn 2 more processes. Running on a Quad Opteron node, all processes are running on the same node, although themachinefile specifies that the slaves should run on different nodes.With the actual version of OpenMPI is it possible to direct the spawned processes on a specific node ? (the node distribution could be given in the "machinefile" file, as with LAM MPI)The code (Fortran 90) of this example and makefile is attached as a tar file.Thank you very much Jean Latour _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
<<attachment: latour.vcf>>