I am not a grid engine expert by any means, but I do know a bit about OMPI's 
internals for binding processes.

Here is what we do:

1. mpirun gets its list of hosts from the environment, or from your machine 
file. It then maps the processes across the machines.

2. mpirun launches a daemon on each node that will host mpi processes. This 
launch is done with --inherit set.

3. each daemon "senses" the local binding constraint by querying the OS to get 
a list of processors available to it on this node.

4. each daemon spawns its local mpi processes, directly telling the OS to bind 
each process to one of the available processors. The processors are selected on 
a round robin basis determined by their relative MPI rank. So you should never 
get two processes assigned to the same processor if adequate processors are 
available. If you are, then that is an OMPI bug.

So SGE is responsible for setting up the global binding (i.e., telling each SGE 
node how many processors we are allowed to use on that node), and then OMPI 
uses that info to set the binding on the individual procs via the local OS.

The key thing to understand here is that SGE has zero visibility or knowledge 
of the individual MPI procs. All SGE ever sees is mpirun and its daemons.

HTH
Ralph

On Nov 13, 2010, at 7:39 AM, Chris Jewell wrote:

> Hi Dave, Reuti,
> 
> Sorry for kicking off this thread, and then disappearing.  I've been away for 
> a bit.  Anyway, Dave, I'm glad you experienced the same issue as I had with 
> my installation of SGE 6.2u5 and OpenMPI with core binding -- namely that 
> with 'qsub -pe openmpi 8 -binding set linear:1 <myscript.com>', if two or 
> more of the parallel processes get scheduled to the same execution node, then 
> the processes end up being bound to the same core.  Not good!
> 
> I've been playing around quite a bit trying to understand this issue, and 
> ended up on the GE dev list:
> 
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=39&dsMessageId=285878
> 
> It seems that most people expect that calls to 'qrsh -inherit' (that I assume 
> OpenMPI uses to bind parallel processes to reserved GE slots) activates a 
> separate binding.  This does not appear to be the case.  I *was* hoping that 
> using -binding pe linear:1 might enable me to write a script that read the 
> pe_hostfile and created a machine file for OpenMPI, but this fails as GE does 
> not appear to give information as to which cores are unbound, only the number 
> required.
> 
> So, for now, my solution has been to use a JSV to remove core binding for the 
> MPI jobs (but retain it for serial and SMP jobs).  Any more ideas??
> 
> Cheers,
> 
> Chris
> 
> (PS. Dave: how is my alma mater these days??)
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to