Hi, Am 14.10.2010 um 13:23 schrieb Dave Love:
> Reuti <re...@staff.uni-marburg.de> writes: > >> With the default binding_instance set to "set" (the default) the >> shepherd should bind the processes to cores already. With other types >> of binding_instance these selected cores must be forward to the >> application via an environment variable or in the hostfile. > > My question was specifically about SGE/OMPI tight integration; are you > actually doing binding successfully with that? I think I read here that > the integration doesn't (yet?) deal with SGE core binding, and when we > turned on the SGE feature we got the OMPI tasks piled onto a single > core. We quickly turned it off for MPI jobs when we realized what was > happening, and I didn't try to investigate further. what did you request in particular in `qsub -binding`? When you request `qsub -pe openmpi 2 -binding linear:1 ...` it would apply the core assignment per `qrsh`. Means, when you are staying on one machine only (because of "$pe_slots" for "allocation_rule"), you would indeed oversubscribe the core as Open MPI will then use threads (hence "-binding linear:2" should do in this case). But if the "allocation_rule" is set to the integer value "1" and you get for sure a core on another machine, then "linear:1" would be fine. Similar `qsub -pe openmpi 4 -binding linear:2 ...` when you have an "allocation_rule" of "2". If in a similar scenario you get 4 cores on one and the same machine and SGE creates a cpuset of 4 cores, these 4 threads can nevertheless be scheduled to any granted core by the Linux scheduler kernel. It would be necssary to use another binding_instance "env" or "pe" to get the information of granted cores into the jobscript/hostfile and decide on your own how to forward this to Open MPI to have each thread also bound to a unique core too and avoid having them drifting around the cores in the cpuset. -- Reuti >> As this is only a hint to SGE and not a hard request, the user must >> plan a little bit the allocation beforehand. Especially if you >> oversubscribe a machine it won't work. > > [It is documented that the binding isn't applied if the selected cores > are occupied.] > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users