Re: Hadoop + MPI

Milind.Bhandarkar Mon, 21 Nov 2011 15:48:28 -0800

Hi Ralph,

I spoke with Jeff Squyres  at SC11, and updated him on the status of my
OpenMPI port on Hadoop Yarn.


To update everyone, I have OpenMPI examples running on #Yarn, although it
requires some code cleanup and refactoring, however that can be done as a
later step.

Currently, the MPI processes come up, get submitting client's IP and port
via environment variables, connect to it, and do a barrier. The result of
this barrier is that everyone in MPI_COMM_WORLD gets each other's
endpoints.

I am aiming to submit the patch to hadoop by the end of this month.

I will publish the openmpi patch to github.

(As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
submissions. That will take some time.)

- Milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)



>
>I'm willing to do the integration work, but wanted to check first to see
>if (a) someone in the Hadoop community is already doing so, and (b) if
>you would be interested in seeing such a capability and willing to accept
>the code contribution?
>
>Establishing MPI support requires the following steps:
>
>1. wireup support. MPI processes need to exchange endpoint info (e.g.,
>for TCP connections, IP address and port) so that each process knows how
>to connect to any other process in the application. This is typically
>done in a collective "modex" operation. There are several ways of doing
>it - if we proceed, I will outline those in a separate email to solicit
>your input on the most desirable approach to use.
>
>2. binding support. One can achieve significant performance improvements
>by binding processes to specific cores, sockets, and/or NUMA regions
>(regardless of using MPI or not, but certainly important for MPI
>applications). This requires not only the binding code, but some logic to
>ensure that one doesn't "overload" specific resources.
>
>3. process mapping. I haven't verified it yet, but I suspect that Hadoop
>provides each executing instance with an identifier that is unique within
>that job - e.g., we typically assign an integer "rank" that ranges from 0
>to the number of instances being executed. This identifier is critical
>for MPI applications, and the relative placement of processes within a
>job often dictates overall performance. Thus, we would provide a mapping
>capability that allows users to specify patterns of process placement for
>their job - e.g., "place one process on each socket on every node".
>
>I have written the code to implement the above support on a number of
>systems, and don't foresee major problems doing it for Hadoop (though I
>would welcome a chance to get a brief walk-thru the code from someone).
>Please let me know if this would be of interest to the Hadoop community.
>
>Thanks
>Ralph Castain
>
>
>

Re: Hadoop + MPI

Reply via email to