Hello, Sometimes we have users that like to do from within a single job (think schedule within an job scheduler allocation): "mpiexec -n X myprog" "mpiexec -n Y myprog2" Does mpiexec within Open MPI keep track of the node list it is using if it binds to a particular scheduler? For example with 4 nodes (2ppn SMP): "mpiexec -n 2 myprog" "mpiexec -n 2 myprog2" "mpiexec -n 1 myprog3" And assume this is by-slot allocation we would have the following allocation: node1 - processor1 - myprog - processor2 - myprog node2 - processor1 - myprog2 - processor2 - myprog2 And for a by-node allocation: node1 - processor1 - myprog - processor2 - myprog2 node2 - processor1 - myprog - processor2 - myprog2
I think this is possible using ssh cause it shouldn't really matter how many times it spawns, but with something like torque it would get restricted to a max process launch of 4. We would want the third mpiexec to block processes and eventually be run on the first available node allocation that frees up from myprog or myprog2 .... For example for torque, we had to add the following to osc mpiexec: --- Finally, since only one mpiexec can be the master at a time, if your code setup requires that mpiexec exit to get a result, you can start a "dummy" mpiexec first in your batch job: mpiexec -server It runs no tasks itself but handles the connections of other transient mpiexec clients. It will shut down cleanly when the batch job exits or you may kill the server explicitly. If the server is killed with SIGTERM (or HUP or INT), it will exit with a status of zero if there were no clients connected at the time. If there were still clients using the server, the server will kill all their tasks, disconnect from the clients, and exit with status 1. --- So a user ran: mpiexec -server mpiexec -n 2 myprog mpiexec -n 2 myprog2 And the server kept track of the allocation ... I would think that the orted could do this? Sorry if this sounds confusing ... But I'm sure it will clear up with any further responses I make. :-) -cdm