Hi,

Am 14.10.2010 um 13:23 schrieb Dave Love:

> Reuti <re...@staff.uni-marburg.de> writes:
> 
>> With the default binding_instance set to "set" (the default) the
>> shepherd should bind the processes to cores already. With other types
>> of binding_instance these selected cores must be forward to the
>> application via an environment variable or in the hostfile.
> 
> My question was specifically about SGE/OMPI tight integration; are you
> actually doing binding successfully with that?  I think I read here that
> the integration doesn't (yet?) deal with SGE core binding, and when we
> turned on the SGE feature we got the OMPI tasks piled onto a single
> core.  We quickly turned it off for MPI jobs when we realized what was
> happening, and I didn't try to investigate further.

what did you request in particular in `qsub -binding`? When you request `qsub 
-pe openmpi 2 -binding linear:1 ...` it would apply the core assignment per 
`qrsh`. Means, when you are staying on one machine only (because of "$pe_slots" 
for "allocation_rule"), you would indeed oversubscribe the core as Open MPI 
will then use threads (hence "-binding linear:2" should do in this case). But 
if the "allocation_rule" is set to the integer value "1" and you get for sure a 
core on another machine, then "linear:1" would be fine. Similar `qsub -pe 
openmpi 4 -binding linear:2 ...` when you have an "allocation_rule" of "2".

If in a similar scenario you get 4 cores on one and the same machine and SGE 
creates a cpuset of 4 cores, these 4 threads can nevertheless be scheduled to 
any granted core by the Linux scheduler kernel. It would be necssary to use 
another binding_instance "env" or "pe" to get the information of granted cores 
into the jobscript/hostfile and decide on your own how to forward this to Open 
MPI to have each thread also bound to a unique core too and avoid having them 
drifting around the cores in the cpuset.

-- Reuti


>> As this is only a hint to SGE and not a hard request, the user must
>> plan a little bit the allocation beforehand. Especially if you
>> oversubscribe a machine it won't work. 
> 
> [It is documented that the binding isn't applied if the selected cores
> are occupied.]
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to