On Tue, Mar 14, 2006 at 12:37:52PM -0600, Edgar Gabriel wrote: > I think I know what goes wrong. Since they are in different 'universes', > they will have exactly the same 'Open MPI name', and therefore the > algorithm in intercomm_merge can not determine which process should be > first and which is second. > > Practically, all jobs which are connected at a certain point in there > lifetime have to be in the same MPI universe, such that all jobs will > have different jobid's and therefore different names. To use the same > universe, you have to start the orted daemon in the persistent mode, so > the sequence should be: > > orted --seed --persistent --scope public > mpirun -np x ./app1 > mpirun -np y ./app2 > > In this case everything should work as expected, you could do the > comm_join between app1 and app2 and the intercomm_merge should work as well. > > Hope this helps
This was fine on a single machine. What do you recommend for multiple machines (e.g. app1 on node1 and app2 on node2)? How do i tell multiple orted instances that they are part of the same universe? thanks ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Labs, IL USA B29D F333 664A 4280 315B