Re: [OMPI users] [OMPI devel] processor affinity -- OpenMPI / batchsystem integration

Jeff Squyres Thu, 22 Oct 2009 18:05:59 -0400

SGE might want to be aware that PLPA has now been deprecated -- we'redoing all future work on "hwloc" (hardware locality). That is, hwlocrepresents the merger of PLPA and libtopology from INRIA. Themajority of the initial code base came from libtopology; more PLPA-like features will come in over time (e.g., embedding capabilities).


hwloc provides all kinds of topology information about the machine.

The first release of hwloc -- v0.9.1 -- will be "soon" (we're in rcstatus right now), but it will not include PLPA-like embeddingcapabilities. Embedding is slated for v1.0.


Come join our mailing lists if you're interested:

    http://www.open-mpi.org/projects/hwloc/


On Oct 22, 2009, at 11:26 AM, Rayson Ho wrote:

Yes, on page 14 of the presentation: "Support for OpenMPI and OpenMP
Through -binding [pe|env] linear|striding" -- SGE performs no binding,
but instead it outputs the binding decision to OpenMPI.
Support for OpenMPI's binding is part of the "Job to Core Binding"project.
Rayson
On Thu, Oct 22, 2009 at 10:16 AM, Ralph Castain <r...@open-mpi.org>wrote:
> Hi Rayson
>
> You're probably aware: starting with 1.3.4, OMPI will detect andabide by> external bindings. So if grid engine sets a binding, we'll followit.
>
> Ralph
>
> On Oct 22, 2009, at 9:03 AM, Rayson Ho wrote:
>
>> The code for the Job to Core Binding (aka. thread binding, or CPU
>> binding) feature was checked into the Grid Engine project cvs. Ituses
>> OpenMPI's Portable Linux Processor Affinity (PLPA) library, and is
>> topology and NUMA aware.
>>
>> The presentation from HPC Software Workshop '09:
>> http://wikis.sun.com/download/attachments/170755116/job2core.pdf
>>
>> The design doc:
>>
>> 
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213897
>>
>> Initial support is planned for 6.2 update 5 (current release isupdate
>> 4, so update 5 is likely to be released in the next 2 or 3 months).
>>
>> Rayson
>>
>>
>>
>> On Tue, Sep 30, 2008 at 2:23 PM, Ralph Castain <r...@lanl.gov>wrote:
>>>
>>> Note that we would also have to modify OMPI to:
>>>
>>> 1. recognize these environmental variables, and
>>>
>>> 2. use them to actually set the binding, instead of using OMPI-internal
>>> directives
>>>
>>> Not a big deal to do, but not something currently in the system.Since we>>> launch through our own daemons (something that isn't likely tochange in
>>> your time frame), these changes would be required.
>>>
>>> Otherwise, we could come up with some method by which you couldprovide>>> mapper information we use. While I agree with Jeff that havingyou tell
>>> us
>>> which cores to use for each rank would generally be better, itdoes raise>>> issues when users want specific mapping algorithms that youmight not>>> support. For example, we are working on mappers that will takeinput from>>> the user regarding comm topology plus system info on networkwiring
>>> topology
>>> and generate a near-optimal mapping of ranks. As part of that,users may>>> request some number of cores be reserved for that rank forthreading or
>>> other purposes.
>>>
>>> So perhaps both  options would be best - give us the list of cores
>>> available
>>> to us so we can map and do affinity, and pass in your ownmapping. Maybe>>> with some logic so we can decide which to use based on whetherOMPI or GE
>>> did the mapping??
>>>
>>> Not sure here - just thinking out loud.
>>> Ralph
>>>
>>> On Sep 30, 2008, at 12:58 PM, Jeff Squyres wrote:
>>>
>>>> On Sep 30, 2008, at 2:51 PM, Rayson Ho wrote:
>>>>
>>>>> Restarting this discussion. A new update version of GridEngine 6.2>>>>> will come out early next year [1], and I really hope that wecan get
>>>>> at least the interface defined.
>>>>
>>>> Great!
>>>>
>>>>> At the minimum, is it enough for the batch system to tellOpenMPI via>>>>> an env variable which core (or virtual core, in the SMT case)to start
>>>>> binding the first MPI task?? I guess an added bonus would be
>>>>> information about the number of processors to skip (the stride)
>>>>> between the sibling tasks?? Stride of one is usually the case,but>>>>> something larger than one would allow the batch system tocontrol the>>>>> level of cache and memory bandwidth sharing between the MPItasks...
>>>>
>>>> Wouldn't it be better to give us a specific list of cores tobind to?
>>>>  As
>>>> core counts go up in servers, I think we may see a re-emergenceof
>>>> having
>>>> multiple MPI jobs on a single server.  And as core counts go even
>>>> *higher*,
>>>> then fragmentation of available cores over time is possible/likely.
>>>>
>>>> Would you be giving us a list of *relative* cores to bind to(i.e.,
>>>> "bind
>>>> to the Nth online core on the machine" -- which may bedifferent than
>>>> the
>>>> OS's ID for that processor) or will you be giving us the actualOS
>>>> virtual
>>>> processor ID(s) to bind to?
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] [OMPI devel] processor affinity -- OpenMPI / batchsystem integration

Reply via email to