The current release of Open MPI does not support running a single universe across multiple machines like you describe. We are currently working on that capability on a side branch of the OpenRTE effort and hope to begin testing it soon. Once we fully validate that functionality, we will bring it over to Open MPI.

Ralph


Robert Latham wrote:
On Tue, Mar 14, 2006 at 12:37:52PM -0600, Edgar Gabriel wrote:
  
I think I know what goes wrong. Since they are in different 'universes', 
they will have exactly the same 'Open MPI name', and therefore the 
algorithm in intercomm_merge can not determine which process should be 
first and which is second.

Practically, all jobs which are connected at a certain point in there 
lifetime have to be in the same MPI universe, such that all jobs will 
have different jobid's and therefore different names. To use the same 
universe, you have to start the orted daemon in the persistent mode, so 
the sequence should be:

orted --seed --persistent --scope public
mpirun -np x ./app1
mpirun -np y ./app2

In this case everything should work as expected, you could do the 
comm_join between app1 and app2 and the intercomm_merge should work as well.

Hope this helps
    
This was fine on a single machine.  What do you recommend for multiple
machines (e.g. app1 on node1 and app2 on node2)? How do i tell
multiple orted instances that they are part of the same universe?

thanks
==rob

  

Reply via email to