Re: [OMPI users] openmpi+torque: How run job in a subset of the allocation?

Gus Correa Wed, 27 Nov 2013 13:58:15 -0500 (EST)

Hi Ola, Ralph

I may be wrong, but I'd guess launching the two solvers
in MPMD/MIMD mode would work smoothly with the torque PBS_NODEFILE,
in a *single* Torque job.
If I understood Ola right, that is what he wants.


Say, something like this (for one 32-core node):

#PBS -l nodes=1:ppn=32
...
mpiexec -np 8 ./solver1 : -np 24 ./solver2

I am assuming the two executables never talk to each other, right?
They solve the same problem with different methods, in a completely
independent and "embarrassingly parallel" fashion, and could run
concurrently.

Is that right?

Or did I misunderstand Ola's description, and they work in a staggeredsequence to each other?

[first s1, then s2, then s1 again, then s2 once more...]

I am a bit confused by Ola's use of the words "loosely coupled" in hisdescription, which might indicate cooperation to solve the same problem,

rather than independent work on two instances of the same problem.

Ralph: Does the MPI model assume that MPMD/MIMD executables
have to necessarily communicate with each other,
or perhaps share a common MPI_COMM_WORLD?
[I guess not.]

Anyway, just a guess,
Gus Correa

On 11/27/2013 10:23 AM, Ralph Castain wrote:

Are you wanting to run the solvers on different nodes within the
allocation? Or on different cores across all nodes?

For different nodes, you can just use -host to specify which nodes you
want that specific mpirun to use, or a hostfile should also be fine. The
FAQ's comment was aimed at people who were giving us the PBS_NODEFILE as
the hostfile - which could confuse older versions of OMPI into using the
rsh launcher instead of Torque. Remember that we have the relative node
syntax so you don't actually have to name the nodes - helps if you want
to execute batch scripts and won't know the node names in advance.

For different cores across all nodes, you would need to use some binding
trickery that may not be in the 1.4 series, so you might need to update
to the 1.6 series. You have two options: (a) have Torque bind your
mpirun to specific cores (I believe it can do that), or (b) use
--slot-list to specify which cores that particular mpirun is to use. You
can then separate the two solvers but still run on all the nodes, if
that is of concern.

HTH
Ralph



On Wed, Nov 27, 2013 at 6:10 AM, <ola.widl...@se.abb.com
<mailto:ola.widl...@se.abb.com>> wrote:

    Hi,

    We have an in-house application where we run two solvers in a
    loosely coupled manner: The first solver runs a timestep, then the
    second solver does work on the same timestep, etc. As the two
    solvers never execute at the same time, we would like to run the two
    solvers in the same allocation (launching mpirun once for each of
    them). RAM is not an issue, so there should not be any risk of
    excessive swapping degrading performance.

    We use openmpi-1.4.5 compiled with torque integration. The torque
    integration means we do not give a hostfile to mpirun, it will
    itself query torque for the allocation info.

    Question:

    Can we force one of the solvers to run in a *subset* of the full
    allocation? How do we do that? I read in the FAQ that providing a
    hostfile to mpirun in this case (when it's not needed due to torque
    integration) would cause a lot of problems...

    Thanks in advance,

    Ola


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] openmpi+torque: How run job in a subset of the allocation?

Reply via email to