Re: [OMPI users] openmpi+torque: How run job in a subset of the allocation?

2013-11-28 Thread George Markomanolis

Hi,

Here is what I do to execute 20 mpirun calls using LSF and one job but 
it is similar for your case I assume.


I use $LSB_HOSTS to extract the hosts from the job. I know how many 
cores I want per job so I create machine files. For our application, 
each execution has its own nodes but the last MPI processes are in 
shared node. For example if I have two mpirun calls I need 40 cores (20 
cores each one). I use three nodes (16 cores per node). First mpirun 
call: first node + 0-3 core on the second node. Second mpirun call: 
third node + 4-7 core on the second node. I do this in order not to 
waste resources as I will need to execute ~20 mpirun calls not just two 
and also the last 4 MPI processes do different task from the first 16 ones.


So I create machine files like that:
rank 0=s15r1b45 slot=0
rank 1=s15r1b45 slot=1
rank 2=s15r1b45 slot=2
rank 3=s15r1b45 slot=3



Now from the root node execute multiple mpirun calls like:

mpirun    &

and after them use the command wait.

So you start many mpirun calls on the background and with the wait you 
are sure that the job will not be killed before the executions are finished.


Just be careful that the machine files do not include common resources 
(cores in my case).


Best regards,
George Markomanolis

On 11/27/2013 10:02 PM, Ralph Castain wrote:

I'm afraid the two solvers would be in the same comm_world if launched that way

Sent from my iPhone


On Nov 27, 2013, at 11:58 AM, Gus Correa  wrote:

Hi Ola, Ralph

I may be wrong, but I'd guess launching the two solvers
in MPMD/MIMD mode would work smoothly with the torque PBS_NODEFILE,
in a *single* Torque job.
If I understood Ola right, that is what he wants.

Say, something like this (for one 32-core node):

#PBS -l nodes=1:ppn=32
...
mpiexec -np 8 ./solver1 : -np 24 ./solver2

I am assuming the two executables never talk to each other, right?
They solve the same problem with different methods, in a completely
independent and "embarrassingly parallel" fashion, and could run
concurrently.

Is that right?
Or did I misunderstand Ola's description, and they work in a staggered sequence 
to each other?
[first s1, then s2, then s1 again, then s2 once more...]
I am a bit confused by Ola's use of the words "loosely coupled" in his 
description, which might indicate cooperation to solve the same problem,
rather than independent work on two instances of the same problem.

Ralph: Does the MPI model assume that MPMD/MIMD executables
have to necessarily communicate with each other,
or perhaps share a common MPI_COMM_WORLD?
[I guess not.]

Anyway, just a guess,
Gus Correa


On 11/27/2013 10:23 AM, Ralph Castain wrote:
Are you wanting to run the solvers on different nodes within the
allocation? Or on different cores across all nodes?

For different nodes, you can just use -host to specify which nodes you
want that specific mpirun to use, or a hostfile should also be fine. The
FAQ's comment was aimed at people who were giving us the PBS_NODEFILE as
the hostfile - which could confuse older versions of OMPI into using the
rsh launcher instead of Torque. Remember that we have the relative node
syntax so you don't actually have to name the nodes - helps if you want
to execute batch scripts and won't know the node names in advance.

For different cores across all nodes, you would need to use some binding
trickery that may not be in the 1.4 series, so you might need to update
to the 1.6 series. You have two options: (a) have Torque bind your
mpirun to specific cores (I believe it can do that), or (b) use
--slot-list to specify which cores that particular mpirun is to use. You
can then separate the two solvers but still run on all the nodes, if
that is of concern.

HTH
Ralph



On Wed, Nov 27, 2013 at 6:10 AM, mailto:ola.widl...@se.abb.com>> wrote:

Hi,

We have an in-house application where we run two solvers in a
loosely coupled manner: The first solver runs a timestep, then the
second solver does work on the same timestep, etc. As the two
solvers never execute at the same time, we would like to run the two
solvers in the same allocation (launching mpirun once for each of
them). RAM is not an issue, so there should not be any risk of
excessive swapping degrading performance.

We use openmpi-1.4.5 compiled with torque integration. The torque
integration means we do not give a hostfile to mpirun, it will
itself query torque for the allocation info.

Question:

Can we force one of the solvers to run in a *subset* of the full
allocation? How do we do that? I read in the FAQ that providing a
hostfile to mpirun in this case (when it's not needed due to torque
integration) would cause a lot of problems...

Thanks in advance,

Ola


___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___

[OMPI users] MPI_THREAD_MULTIPLE causes deadlock in simple MPI_Barrier case (ompi 1.6.5 and 1.7.3)

2013-11-28 Thread Jean-Francois St-Pierre
Hi,
I've compiled ompi1.6.5 with multi-thread support (using Intel
compilers 12.0.4.191, but I get the same result with gcc) :

./configure --with-tm/opt/torque --with-openib
--enable-mpi-thread-multiple CC=icc CXX=icpc F77=ifort FC=ifort

And i've built a simple sample code that only does the Init and one
MPI_Barrier. The core of the code is :

  setbuf(stdout, NULL);
  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
  fprintf(stdout,"%6d: Provided thread support %d ", getpid(), provided);
  int flag,claimed;
  MPI_Is_thread_main( &flag );
  MPI_Query_thread( &claimed );

  fprintf(stdout,"%6d: Before Comm_rank, flag %d, claimed %d \n",
getpid(), flag, claimed);
  MPI_Comm_rank(MPI_COMM_WORLD, &gRank);

  fprintf(stdout,"%6d: Comm_size rank %d\n",getpid(), gRank);
  MPI_Comm_size(MPI_COMM_WORLD, &gNTasks);

  fprintf(stdout,"%6d: Before Barrier\n", getpid());
  MPI_Barrier( MPI_COMM_WORLD );

  fprintf(stdout,"%6d: After Barrier\n", getpid());
  MPI_Finalize();

When I launch it on 2 nodes (mono-core per node) I get this sample output :

/***  Output
 mpirun -pernode -np 2 sample_code
 7356: Provided thread support 3 MPI_THREAD_MULTIPLE
 7356: Before Comm_rank, flag 1, claimed 3
 7356: Comm_size rank 0
 7356: Before Barrier
 26277: Provided thread support 3 MPI_THREAD_MULTIPLE
 26277: Before Comm_rank, flag 1, claimed 3
 26277: Comm_size rank 1
 26277: Before Barrier
 ^Cmpirun: killing job...
 */

The deadlock never gets over the MPI_Barrier when I use
MPI_THREAD_MULTIPLE, but it runs fine using MPI_THREAD_SERIALIZED .  I
get the same behavior with ompi 1.7.3. I don't get a deadlock when the
2 MPI processes are hosted on the same node.

In attachement, you'll find my config.out, config.log, environment
variables on the execution node, both make.out, sample code and output
etc.

Thanks,

Jeff


ompi-output.tar.bz2
Description: BZip2 compressed data