The current release of Open MPI does not support running a single
universe across multiple machines like you describe. We are currently
working on that capability on a side branch of the OpenRTE effort and
hope to begin testing it soon. Once we fully validate that
functionality, we will bring it over to Open MPI. Ralph Robert Latham wrote: On Tue, Mar 14, 2006 at 12:37:52PM -0600, Edgar Gabriel wrote:I think I know what goes wrong. Since they are in different 'universes', they will have exactly the same 'Open MPI name', and therefore the algorithm in intercomm_merge can not determine which process should be first and which is second.Practically, all jobs which are connected at a certain point in there lifetime have to be in the same MPI universe, such that all jobs will have different jobid's and therefore different names. To use the same universe, you have to start the orted daemon in the persistent mode, so the sequence should be: orted --seed --persistent --scope public mpirun -np x ./app1 mpirun -np y ./app2 In this case everything should work as expected, you could do the comm_join between app1 and app2 and the intercomm_merge should work as well. Hope this helpsThis was fine on a single machine. What do you recommend for multiple machines (e.g. app1 on node1 and app2 on node2)? How do i tell multiple orted instances that they are part of the same universe? thanks ==rob |
- [OMPI users] comm_join and singleton init Robert Latham
- Re: [OMPI users] comm_join and singleton init Edgar Gabriel
- Re: [OMPI users] comm_join and singleton init Edgar Gabriel
- Re: [OMPI users] comm_join and singleton init Robert Latham
- Re: [OMPI users] comm_join and singleton in... Edgar Gabriel
- Re: [OMPI users] comm_join and singleton in... Ralph Castain