[OMPI users] Multiple mpiexec's within a job (schedule within a scheduled machinefile/job allocation)

Maestas, Christopher Daniel Mon, 11 Dec 2006 22:07:25 -0500

Hello,

Sometimes we have users that like to do from within a single job (think
schedule within an job scheduler allocation):
        "mpiexec -n X myprog"
        "mpiexec -n Y myprog2"
Does mpiexec within Open MPI keep track of the node list it is using if
it binds to a particular scheduler?
For example with 4 nodes (2ppn SMP):
        "mpiexec -n 2 myprog"
        "mpiexec -n 2 myprog2"
        "mpiexec -n 1 myprog3"
And assume this is by-slot allocation we would have the following
allocation:
        node1   - processor1 - myprog
                - processor2 - myprog
        node2   - processor1 - myprog2
                - processor2 - myprog2
And for a by-node allocation:
        node1 - processor1 - myprog
                - processor2 - myprog2
        node2 - processor1 - myprog
                - processor2 - myprog2


I think this is possible using ssh cause it shouldn't really matter how
many times it spawns, but with something like torque it would get
restricted to a max process launch of 4.  We would want the third
mpiexec to block processes and eventually be run on the first available
node allocation that frees up from myprog or myprog2 ....

For example for torque, we had to add the following to osc mpiexec:
---
       Finally,  since only one mpiexec can be the master at a time, if
your code setup requires
       that mpiexec exit to get a result, you can start a "dummy"
mpiexec first  in  your  batch
       job:

             mpiexec -server

       It  runs  no tasks itself but handles the connections of other
transient mpiexec clients.
       It will shut down cleanly when the batch job exits or you may
kill the server explicitly.
       If  the server is killed with SIGTERM (or HUP or INT), it will
exit with a status of zero
       if there were no clients connected at the time.  If there were
still  clients  using  the
       server,  the server will kill all their tasks, disconnect from
the clients, and exit with
       status 1.
---

So a user ran:
        mpiexec -server
        mpiexec -n 2 myprog
        mpiexec -n 2 myprog2
And the server kept track of the allocation ... I would think that the
orted could do this? 

Sorry if this sounds confusing ... But I'm sure it will clear up with
any further responses I make. :-)
-cdm

[OMPI users] Multiple mpiexec's within a job (schedule within a scheduled machinefile/job allocation)

Reply via email to