On Tue, Mar 14, 2006 at 12:37:52PM -0600, Edgar Gabriel wrote:
> I think I know what goes wrong. Since they are in different 'universes', 
> they will have exactly the same 'Open MPI name', and therefore the 
> algorithm in intercomm_merge can not determine which process should be 
> first and which is second.
> 
> Practically, all jobs which are connected at a certain point in there 
> lifetime have to be in the same MPI universe, such that all jobs will 
> have different jobid's and therefore different names. To use the same 
> universe, you have to start the orted daemon in the persistent mode, so 
> the sequence should be:
> 
> orted --seed --persistent --scope public
> mpirun -np x ./app1
> mpirun -np y ./app2
> 
> In this case everything should work as expected, you could do the 
> comm_join between app1 and app2 and the intercomm_merge should work as well.
> 
> Hope this helps

This was fine on a single machine.  What do you recommend for multiple
machines (e.g. app1 on node1 and app2 on node2)? How do i tell
multiple orted instances that they are part of the same universe?

thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B

Reply via email to