Re: Hadoop + MPI

Arun Murthy Wed, 23 Nov 2011 17:31:44 -0800

Awesome, thanks to both you guys! It's very exciting to see this progress!

Arun


Sent from my iPhone

On Nov 23, 2011, at 5:14 PM, Ralph Castain <r...@open-mpi.org> wrote:

> FWIW: I can commit the OMPI part of your patch for you. The CCLA is intended 
> to ensure that people realize the need to protect OMPI from "infection" due 
> to code based on other licenses such as GPL. For people only offering a 
> single patch, it often is too big a burden to get corporate approval of the 
> legal document.
>
> So as long as someone (e.g., me) who already is operating under the CCLA is 
> willing to review and commit the patch, and the patch isn't too huge, we can 
> absorb it that way. I expect your patch is just a new ess component, and I'm 
> happy to do the review and commit it on your behalf, if that is acceptable to 
> you.
>
>
> On Nov 21, 2011, at 5:04 PM, <milind.bhandar...@emc.com> 
> <milind.bhandar...@emc.com> wrote:
>
>> Ralph,
>>
>> Yes, I have completed the first step, although I would really like that
>> code to be part of the MPI Application Master (Chris Douglas suggested a
>> way to do this at ApacheCon).
>>
>> Regarding the remaining steps, I have been following discussions on the
>> open mpi mailing lists, and reading code for hwloc.
>>
>> If you are making a trip to Cisco HQ sometime soon, I would like to have a
>> face-to-face about hwloc. I have so far avoided to use a native task
>> controller for spawning MPI jobs, but given the lack of support for
>> binding in Java, it looks like I will have to bite the bullet.
>>
>> - milind
>>
>> ---
>> Milind Bhandarkar
>> Greenplum Labs, EMC
>> (Disclaimer: Opinions expressed in this email are those of the author, and
>> do not necessarily represent the views of any organization, past or
>> present, the author might be affiliated with.)
>>
>>
>>
>> On 11/21/11 3:54 PM, "Ralph Castain" <r...@open-mpi.org> wrote:
>>
>>> Hi Milind
>>>
>>> Glad to hear of the progress - I recall our earlier conversation. I
>>> gather you have completed step 1 (wireup) - have you given any thought to
>>> the other two steps? Anything I can do to help?
>>>
>>> Ralph
>>>
>>>
>>> On Nov 21, 2011, at 4:47 PM, <milind.bhandar...@emc.com> wrote:
>>>
>>>> Hi Ralph,
>>>>
>>>> I spoke with Jeff Squyres  at SC11, and updated him on the status of my
>>>> OpenMPI port on Hadoop Yarn.
>>>>
>>>> To update everyone, I have OpenMPI examples running on #Yarn, although
>>>> it
>>>> requires some code cleanup and refactoring, however that can be done as
>>>> a
>>>> later step.
>>>>
>>>> Currently, the MPI processes come up, get submitting client's IP and
>>>> port
>>>> via environment variables, connect to it, and do a barrier. The result
>>>> of
>>>> this barrier is that everyone in MPI_COMM_WORLD gets each other's
>>>> endpoints.
>>>>
>>>> I am aiming to submit the patch to hadoop by the end of this month.
>>>>
>>>> I will publish the openmpi patch to github.
>>>>
>>>> (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
>>>> submissions. That will take some time.)
>>>>
>>>> - Milind
>>>>
>>>> ---
>>>> Milind Bhandarkar
>>>> Greenplum Labs, EMC
>>>> (Disclaimer: Opinions expressed in this email are those of the author,
>>>> and
>>>> do not necessarily represent the views of any organization, past or
>>>> present, the author might be affiliated with.)
>>>>
>>>>
>>>>
>>>>>
>>>>> I'm willing to do the integration work, but wanted to check first to
>>>>> see
>>>>> if (a) someone in the Hadoop community is already doing so, and (b) if
>>>>> you would be interested in seeing such a capability and willing to
>>>>> accept
>>>>> the code contribution?
>>>>>
>>>>> Establishing MPI support requires the following steps:
>>>>>
>>>>> 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
>>>>> for TCP connections, IP address and port) so that each process knows
>>>>> how
>>>>> to connect to any other process in the application. This is typically
>>>>> done in a collective "modex" operation. There are several ways of doing
>>>>> it - if we proceed, I will outline those in a separate email to solicit
>>>>> your input on the most desirable approach to use.
>>>>>
>>>>> 2. binding support. One can achieve significant performance
>>>>> improvements
>>>>> by binding processes to specific cores, sockets, and/or NUMA regions
>>>>> (regardless of using MPI or not, but certainly important for MPI
>>>>> applications). This requires not only the binding code, but some logic
>>>>> to
>>>>> ensure that one doesn't "overload" specific resources.
>>>>>
>>>>> 3. process mapping. I haven't verified it yet, but I suspect that
>>>>> Hadoop
>>>>> provides each executing instance with an identifier that is unique
>>>>> within
>>>>> that job - e.g., we typically assign an integer "rank" that ranges
>>>>> from 0
>>>>> to the number of instances being executed. This identifier is critical
>>>>> for MPI applications, and the relative placement of processes within a
>>>>> job often dictates overall performance. Thus, we would provide a
>>>>> mapping
>>>>> capability that allows users to specify patterns of process placement
>>>>> for
>>>>> their job - e.g., "place one process on each socket on every node".
>>>>>
>>>>> I have written the code to implement the above support on a number of
>>>>> systems, and don't foresee major problems doing it for Hadoop (though I
>>>>> would welcome a chance to get a brief walk-thru the code from someone).
>>>>> Please let me know if this would be of interest to the Hadoop
>>>>> community.
>>>>>
>>>>> Thanks
>>>>> Ralph Castain
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: Hadoop + MPI

Reply via email to