I'm afraid the two solvers would be in the same comm_world if launched that way

Sent from my iPhone

> On Nov 27, 2013, at 11:58 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> 
> Hi Ola, Ralph
> 
> I may be wrong, but I'd guess launching the two solvers
> in MPMD/MIMD mode would work smoothly with the torque PBS_NODEFILE,
> in a *single* Torque job.
> If I understood Ola right, that is what he wants.
> 
> Say, something like this (for one 32-core node):
> 
> #PBS -l nodes=1:ppn=32
> ...
> mpiexec -np 8 ./solver1 : -np 24 ./solver2
> 
> I am assuming the two executables never talk to each other, right?
> They solve the same problem with different methods, in a completely
> independent and "embarrassingly parallel" fashion, and could run
> concurrently.
> 
> Is that right?
> Or did I misunderstand Ola's description, and they work in a staggered 
> sequence to each other?
> [first s1, then s2, then s1 again, then s2 once more...]
> I am a bit confused by Ola's use of the words "loosely coupled" in his 
> description, which might indicate cooperation to solve the same problem,
> rather than independent work on two instances of the same problem.
> 
> Ralph: Does the MPI model assume that MPMD/MIMD executables
> have to necessarily communicate with each other,
> or perhaps share a common MPI_COMM_WORLD?
> [I guess not.]
> 
> Anyway, just a guess,
> Gus Correa
> 
>> On 11/27/2013 10:23 AM, Ralph Castain wrote:
>> Are you wanting to run the solvers on different nodes within the
>> allocation? Or on different cores across all nodes?
>> 
>> For different nodes, you can just use -host to specify which nodes you
>> want that specific mpirun to use, or a hostfile should also be fine. The
>> FAQ's comment was aimed at people who were giving us the PBS_NODEFILE as
>> the hostfile - which could confuse older versions of OMPI into using the
>> rsh launcher instead of Torque. Remember that we have the relative node
>> syntax so you don't actually have to name the nodes - helps if you want
>> to execute batch scripts and won't know the node names in advance.
>> 
>> For different cores across all nodes, you would need to use some binding
>> trickery that may not be in the 1.4 series, so you might need to update
>> to the 1.6 series. You have two options: (a) have Torque bind your
>> mpirun to specific cores (I believe it can do that), or (b) use
>> --slot-list to specify which cores that particular mpirun is to use. You
>> can then separate the two solvers but still run on all the nodes, if
>> that is of concern.
>> 
>> HTH
>> Ralph
>> 
>> 
>> 
>> On Wed, Nov 27, 2013 at 6:10 AM, <ola.widl...@se.abb.com
>> <mailto:ola.widl...@se.abb.com>> wrote:
>> 
>>    Hi,
>> 
>>    We have an in-house application where we run two solvers in a
>>    loosely coupled manner: The first solver runs a timestep, then the
>>    second solver does work on the same timestep, etc. As the two
>>    solvers never execute at the same time, we would like to run the two
>>    solvers in the same allocation (launching mpirun once for each of
>>    them). RAM is not an issue, so there should not be any risk of
>>    excessive swapping degrading performance.
>> 
>>    We use openmpi-1.4.5 compiled with torque integration. The torque
>>    integration means we do not give a hostfile to mpirun, it will
>>    itself query torque for the allocation info.
>> 
>>    Question:
>> 
>>    Can we force one of the solvers to run in a *subset* of the full
>>    allocation? How do we do that? I read in the FAQ that providing a
>>    hostfile to mpirun in this case (when it's not needed due to torque
>>    integration) would cause a lot of problems...
>> 
>>    Thanks in advance,
>> 
>>    Ola
>> 
>> 
>>    _______________________________________________
>>    users mailing list
>>    us...@open-mpi.org <mailto:us...@open-mpi.org>
>>    http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to