Re: [OMPI users] Multiple mpiexec's within a job (schedule within a scheduled machinefile/job allocation)

Ralph Castain Tue, 12 Dec 2006 00:47:08 -0500

Hi Chris

Some of this is doable with today's code....and one of these behaviors is
not. :-(

Open MPI/OpenRTE can be run in "persistent" mode - this allows multiple jobs
to share the same allocation. This works much as you describe (syntax is
slightly different, of course!) - the first mpirun will map using whatever
mode was requested, then the next mpirun will map starting from where the
first one left off.

I *believe* you can run each mpirun in the background. However, I don't know
if this has really been tested enough to support such a claim. All testing
that I know about to-date has executed mpirun in the foreground - thus, your
example would execute sequentially instead of in parallel.

I know people have tested multiple mpirun's operating in parallel within a
single allocation (i.e., persistent mode) where the mpiruns are executed in
separate windows/prompts. So I suspect you could do something like you
describe - just haven't personally verified it.

Where we definitely differ is that Open MPI/RTE will *not* block until
resources are freed up from the prior mpiruns. Instead, we will attempt to
execute each mpirun immediately - and will error out the one(s) that try to
execute without sufficient resources. I imagine we could provide the kind of
"flow control" you describe, but I'm not sure when that might happen.

I am (in my copious free time...haha) working on an "orteboot" program that
will startup a virtual machine to make the persistent mode of operation a
little easier. For now, though, you can do it by:

1. starting up the "server" using the following command:
orted --seed --persistent --scope public [--universe foo]

2. do your mpirun commands. They will automagically find the "server" and
connect to it. If you specified a universe name when starting the server,
then you must specify the same universe name on your mpirun commands.

When you are done, you will have to (unfortunately) manually "kill" the
server and remove its session directory. I have a program called "ortehalt"
in the trunk that will do this cleanly for you, but it isn't yet in the
release distributions. You are welcome to use it, though, if you are working
with the trunk - I can't promise it is bulletproof yet, but it seems to be
working.

Ralph

On 12/11/06 8:07 PM, "Maestas, Christopher Daniel" <cdma...@sandia.gov>
wrote:

> Hello,
> 
> Sometimes we have users that like to do from within a single job (think
> schedule within an job scheduler allocation):
> "mpiexec -n X myprog"
> "mpiexec -n Y myprog2"
> Does mpiexec within Open MPI keep track of the node list it is using if
> it binds to a particular scheduler?
> For example with 4 nodes (2ppn SMP):
> "mpiexec -n 2 myprog"
> "mpiexec -n 2 myprog2"
> "mpiexec -n 1 myprog3"
> And assume this is by-slot allocation we would have the following
> allocation:
> node1 - processor1 - myprog
> - processor2 - myprog
> node2 - processor1 - myprog2
> - processor2 - myprog2
> And for a by-node allocation:
> node1 - processor1 - myprog
> - processor2 - myprog2
> node2 - processor1 - myprog
> - processor2 - myprog2
> 
> I think this is possible using ssh cause it shouldn't really matter how
> many times it spawns, but with something like torque it would get
> restricted to a max process launch of 4.  We would want the third
> mpiexec to block processes and eventually be run on the first available
> node allocation that frees up from myprog or myprog2 ....
> 
> For example for torque, we had to add the following to osc mpiexec:
> ---
>        Finally,  since only one mpiexec can be the master at a time, if
> your code setup requires
>        that mpiexec exit to get a result, you can start a "dummy"
> mpiexec first  in  your  batch
>        job:
> 
>              mpiexec -server
> 
>        It  runs  no tasks itself but handles the connections of other
> transient mpiexec clients.
>        It will shut down cleanly when the batch job exits or you may
> kill the server explicitly.
>        If  the server is killed with SIGTERM (or HUP or INT), it will
> exit with a status of zero
>        if there were no clients connected at the time.  If there were
> still  clients  using  the
>        server,  the server will kill all their tasks, disconnect from
> the clients, and exit with
>        status 1.
> ---
> 
> So a user ran:
> mpiexec -server
> mpiexec -n 2 myprog
> mpiexec -n 2 myprog2
> And the server kept track of the allocation ... I would think that the
> orted could do this?
> 
> Sorry if this sounds confusing ... But I'm sure it will clear up with
> any further responses I make. :-)
> -cdm
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Multiple mpiexec's within a job (schedule within a scheduled machinefile/job allocation)

Reply via email to