Re: [OMPI users] How to specify hosts for MPI_Comm_spawn

Ralph Castain Wed, 30 Jul 2008 12:05:15 -0400

Just to be clear: you do not require a daemon on every node. You justneed one daemon - sitting somewhere - that can act as the data serverfor MPI_Name_publish/lookup. You then tell each app where to find it.

Normally, mpirun fills that function. But if you don't have it, youcan kickoff a persistent orted (perhaps just have the parentapplication fork/exec it) for that purpose.



On Jul 30, 2008, at 9:50 AM, Robert Kubrick wrote:

On Jul 30, 2008, at 11:12 AM, Mark Borgerding wrote:
I appreciate the suggestion about running a daemon on each of theremote nodes, but wouldn't I kind of be reinventing the wheelthere? Process management is one of the things I'd like to be ableto count on ORTE for.Would the following work to give the parent process an intercommwith each child?
parent i.e. my non-mpirun-started process calls MPI_Init thenMPI_Open_portparent spawns mpirun command via system/exec to create the remotechildren . The name from MPI_Open_port is placed in the environment.
parent calls MPI_Comm_accept (once for each child?)
I think you have to create a separate thread to run the accept, inorder to accept multiple client connections. This should besupported by OpenMPI as it was the original idea of the API designto handle multiple client connections. There is an MPI_Comm_acceptmulti-thread example in the book Using MPI-2.
all children call MPI_connect to the name
I think this would give one intercommunicator back to the parentfor each remote process (not ideal, but I can worry about broadcastdata later)The remote processes can communicate to each other throughMPI_COMM_WORLD.
You should be able to merge each child communicator from each acceptthread into a global comm anyway.
Actually when I think through the details, much of this is prettysimilar to the daemon MPI_Publish_name+MPI_Lookup_name approach.The main difference being which processes come first.
You can run a deamon through system/exec the same way you runmpiexec. Just use ssh or rsh on the system/exec call.
Mark Borgerding wrote:
I'm afraid I can't dictate to the customer that they must upgrade.
The target platform is RHEL 5.2 ( uses openmpi 1.2.6 )
I will try to find some sort of workaround. Any suggestions on howto "fake" the functionality of MPI_Comm_spawn are welcome.
To reiterate my needs:
I am writing a shared object that plugs into an existing framework.
I do not control how the framework launches its processes (nompirun).
I want to start remote processes to crunch the data.
The shared object marshall the I/O between the framework and theremote processes.
-- Mark


Ralph Castain wrote:
Singleton comm_spawn works fine on the 1.3 release branch - ifsingleton comm_spawn is critical to your plans, I suggest movingto that version. You can get a pre-release version off of the www.open-mpi.orgweb site.
On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote:
As your own tests have shown, it works fine if you just "mpirun -n 1 ./spawner". It is only singleton comm_spawn that appears tobe having a problem in the latest 1.2 release. So I don't thinkcomm_spawn is "useless". ;-)
I'm checking this morning to ensure that singletons properlyspawns on other nodes in the 1.3 release. I sincerely doubt wewill backport a fix to 1.2.
On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote:
I keep checking my email in hopes that someone will come upwith something that Matt or I might've missed.I'm just having a hard time accepting that something sofundamental would be so broken.The MPI_Comm_spawn command is essentially useless without theability to spawn processes on other nodes.
If this is true, then my personal scorecard reads:
# Days spent using openmpi: 4 (off and on)
# identified bugs in openmpi :2
# useful programs built: 0
Please prove me wrong. I'm eager to be shown my ignorance --to find out where I've been stupid and what documentation Ishould've read.
Matt Hughes wrote:
I've found that I always have to use mpirun to start my spawner
process, due to the exact problem you are having: the need togiveOMPI a hosts file! It seems the singleton functionality islackingsomehow... it won't allow you to spawn on arbitrary hosts. Ihave not
tested if this is fixed in the 1.3 series.

Try
mpiexec -np 1 -H op2-1,op2-2 spawner op2-2
mpiexec should start the first process on op2-1, and the spawncallshould start the second on op2-2. If you don't use the Infoobject toset the hostname specifically, then on 1.2.x it willautomatically
start on op2-2.  With 1.3, the spawn call will start processes
starting with the first item in the host list.

mch
[snip]
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] How to specify hosts for MPI_Comm_spawn

Reply via email to