Hi Reuti,

Okay so I tried what you suggested.  You essentially get the requested number 
of bound cores on each execution node, so if I use

$ qsub -pe openmpi 8 -binding linear:2 <myscript.com>

then I get 2 bound cores per node, irrespective of the number of slots (and 
hence parallel) processes allocated by GE.  This is irrespective of which 
setting I use for the allocation_rule.

My aim with this was to deal with badly behaved multithreaded algorithms which 
end up spreading across more cores on an execution node than the number of 
GE-allocated slots (thereby interfering with other GE scheduled tasks running 
on the same exec node).  By binding a process to one or more cores, one can 
"box in" processes and prevent them from spawning erroneous sub-processes and 
threads.  Unfortunately, the above solution sets the same core binding for each 
execution node to be the same.

>From exploring the software (both OpenMPI and GE) further, I have two comments:

1) The core binding feature in GE appears to apply the requested core-binding 
topology to every execution node involved in a parallel job, rather than 
assuming that the topology requested is *per parallel process*.  So, if I 
request 'qsub -pe mpi 8 -binding linear:1 <myscript.com>' with the intention of 
getting each of the 8 parallel processes to be bound to 1 core, I actually get 
all processes associated with the job_id on one exec node bound to 1 core.  
Oops!

2) OpenMPI has its own core-binding feature (-mca mpi_paffinity_alone 1) which 
works well to bind each parallel process to one processor.  Unfortunately, the 
binding framework (hwloc) is different to that which GE uses (PLPA), resulting 
in binding overlaps between GE-bound tasks (eg serial and smp jobs) and 
OpenMPI-bound processes (ie my mpi jobs).  Again, oops ;-)


If, indeed, it is not possible currently to implement this type of core-binding 
in tightly integrated OpenMPI/GE, then a solution might lie in a custom script 
run in the parallel environment's 'start proc args'.  This script would have to 
find out which slots are allocated where on the cluster, and write an OpenMPI 
rankfile.

Any thoughts on that?

Cheers,

Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






Reply via email to