Hi Rayson
You're probably aware: starting with 1.3.4, OMPI will detect and abide
by external bindings. So if grid engine sets a binding, we'll follow it.
Ralph
On Oct 22, 2009, at 9:03 AM, Rayson Ho wrote:
The code for the Job to Core Binding (aka. thread binding, or CPU
binding) feature was checked into the Grid Engine project cvs. It uses
OpenMPI's Portable Linux Processor Affinity (PLPA) library, and is
topology and NUMA aware.
The presentation from HPC Software Workshop '09:
http://wikis.sun.com/download/attachments/170755116/job2core.pdf
The design doc:
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213897
Initial support is planned for 6.2 update 5 (current release is update
4, so update 5 is likely to be released in the next 2 or 3 months).
Rayson
On Tue, Sep 30, 2008 at 2:23 PM, Ralph Castain <r...@lanl.gov> wrote:
Note that we would also have to modify OMPI to:
1. recognize these environmental variables, and
2. use them to actually set the binding, instead of using OMPI-
internal
directives
Not a big deal to do, but not something currently in the system.
Since we
launch through our own daemons (something that isn't likely to
change in
your time frame), these changes would be required.
Otherwise, we could come up with some method by which you could
provide
mapper information we use. While I agree with Jeff that having you
tell us
which cores to use for each rank would generally be better, it does
raise
issues when users want specific mapping algorithms that you might not
support. For example, we are working on mappers that will take
input from
the user regarding comm topology plus system info on network wiring
topology
and generate a near-optimal mapping of ranks. As part of that,
users may
request some number of cores be reserved for that rank for
threading or
other purposes.
So perhaps both options would be best - give us the list of cores
available
to us so we can map and do affinity, and pass in your own mapping.
Maybe
with some logic so we can decide which to use based on whether OMPI
or GE
did the mapping??
Not sure here - just thinking out loud.
Ralph
On Sep 30, 2008, at 12:58 PM, Jeff Squyres wrote:
On Sep 30, 2008, at 2:51 PM, Rayson Ho wrote:
Restarting this discussion. A new update version of Grid Engine 6.2
will come out early next year [1], and I really hope that we can
get
at least the interface defined.
Great!
At the minimum, is it enough for the batch system to tell OpenMPI
via
an env variable which core (or virtual core, in the SMT case) to
start
binding the first MPI task?? I guess an added bonus would be
information about the number of processors to skip (the stride)
between the sibling tasks?? Stride of one is usually the case, but
something larger than one would allow the batch system to control
the
level of cache and memory bandwidth sharing between the MPI
tasks...
Wouldn't it be better to give us a specific list of cores to bind
to? As
core counts go up in servers, I think we may see a re-emergence of
having
multiple MPI jobs on a single server. And as core counts go even
*higher*,
then fragmentation of available cores over time is possible/likely.
Would you be giving us a list of *relative* cores to bind to
(i.e., "bind
to the Nth online core on the machine" -- which may be different
than the
OS's ID for that processor) or will you be giving us the actual OS
virtual
processor ID(s) to bind to?
--
Jeff Squyres
Cisco Systems
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users