Awesome, thanks to both you guys! It's very exciting to see this progress! Arun
Sent from my iPhone On Nov 23, 2011, at 5:14 PM, Ralph Castain <r...@open-mpi.org> wrote: > FWIW: I can commit the OMPI part of your patch for you. The CCLA is intended > to ensure that people realize the need to protect OMPI from "infection" due > to code based on other licenses such as GPL. For people only offering a > single patch, it often is too big a burden to get corporate approval of the > legal document. > > So as long as someone (e.g., me) who already is operating under the CCLA is > willing to review and commit the patch, and the patch isn't too huge, we can > absorb it that way. I expect your patch is just a new ess component, and I'm > happy to do the review and commit it on your behalf, if that is acceptable to > you. > > > On Nov 21, 2011, at 5:04 PM, <milind.bhandar...@emc.com> > <milind.bhandar...@emc.com> wrote: > >> Ralph, >> >> Yes, I have completed the first step, although I would really like that >> code to be part of the MPI Application Master (Chris Douglas suggested a >> way to do this at ApacheCon). >> >> Regarding the remaining steps, I have been following discussions on the >> open mpi mailing lists, and reading code for hwloc. >> >> If you are making a trip to Cisco HQ sometime soon, I would like to have a >> face-to-face about hwloc. I have so far avoided to use a native task >> controller for spawning MPI jobs, but given the lack of support for >> binding in Java, it looks like I will have to bite the bullet. >> >> - milind >> >> --- >> Milind Bhandarkar >> Greenplum Labs, EMC >> (Disclaimer: Opinions expressed in this email are those of the author, and >> do not necessarily represent the views of any organization, past or >> present, the author might be affiliated with.) >> >> >> >> On 11/21/11 3:54 PM, "Ralph Castain" <r...@open-mpi.org> wrote: >> >>> Hi Milind >>> >>> Glad to hear of the progress - I recall our earlier conversation. I >>> gather you have completed step 1 (wireup) - have you given any thought to >>> the other two steps? Anything I can do to help? >>> >>> Ralph >>> >>> >>> On Nov 21, 2011, at 4:47 PM, <milind.bhandar...@emc.com> wrote: >>> >>>> Hi Ralph, >>>> >>>> I spoke with Jeff Squyres at SC11, and updated him on the status of my >>>> OpenMPI port on Hadoop Yarn. >>>> >>>> To update everyone, I have OpenMPI examples running on #Yarn, although >>>> it >>>> requires some code cleanup and refactoring, however that can be done as >>>> a >>>> later step. >>>> >>>> Currently, the MPI processes come up, get submitting client's IP and >>>> port >>>> via environment variables, connect to it, and do a barrier. The result >>>> of >>>> this barrier is that everyone in MPI_COMM_WORLD gets each other's >>>> endpoints. >>>> >>>> I am aiming to submit the patch to hadoop by the end of this month. >>>> >>>> I will publish the openmpi patch to github. >>>> >>>> (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting >>>> submissions. That will take some time.) >>>> >>>> - Milind >>>> >>>> --- >>>> Milind Bhandarkar >>>> Greenplum Labs, EMC >>>> (Disclaimer: Opinions expressed in this email are those of the author, >>>> and >>>> do not necessarily represent the views of any organization, past or >>>> present, the author might be affiliated with.) >>>> >>>> >>>> >>>>> >>>>> I'm willing to do the integration work, but wanted to check first to >>>>> see >>>>> if (a) someone in the Hadoop community is already doing so, and (b) if >>>>> you would be interested in seeing such a capability and willing to >>>>> accept >>>>> the code contribution? >>>>> >>>>> Establishing MPI support requires the following steps: >>>>> >>>>> 1. wireup support. MPI processes need to exchange endpoint info (e.g., >>>>> for TCP connections, IP address and port) so that each process knows >>>>> how >>>>> to connect to any other process in the application. This is typically >>>>> done in a collective "modex" operation. There are several ways of doing >>>>> it - if we proceed, I will outline those in a separate email to solicit >>>>> your input on the most desirable approach to use. >>>>> >>>>> 2. binding support. One can achieve significant performance >>>>> improvements >>>>> by binding processes to specific cores, sockets, and/or NUMA regions >>>>> (regardless of using MPI or not, but certainly important for MPI >>>>> applications). This requires not only the binding code, but some logic >>>>> to >>>>> ensure that one doesn't "overload" specific resources. >>>>> >>>>> 3. process mapping. I haven't verified it yet, but I suspect that >>>>> Hadoop >>>>> provides each executing instance with an identifier that is unique >>>>> within >>>>> that job - e.g., we typically assign an integer "rank" that ranges >>>>> from 0 >>>>> to the number of instances being executed. This identifier is critical >>>>> for MPI applications, and the relative placement of processes within a >>>>> job often dictates overall performance. Thus, we would provide a >>>>> mapping >>>>> capability that allows users to specify patterns of process placement >>>>> for >>>>> their job - e.g., "place one process on each socket on every node". >>>>> >>>>> I have written the code to implement the above support on a number of >>>>> systems, and don't foresee major problems doing it for Hadoop (though I >>>>> would welcome a chance to get a brief walk-thru the code from someone). >>>>> Please let me know if this would be of interest to the Hadoop >>>>> community. >>>>> >>>>> Thanks >>>>> Ralph Castain >>>>> >>>>> >>>>> >>>> >>> >>> >> >