Hi Ralph, This is regarding the MapReduce support with OpenMPI for which you gave a good amount of info previously. I have several MR applications that I'd like to test for performance in an HPC cluster with OpenMPI. I found this presentation by you http://www.open-mpi.org/video/mrplus/Greenplum_RalphCastain-1up.pdf, but wonder if there's some detailed steps on getting a simple MR program running with OpenMPI.
Thank you, Saliya On Mon, Feb 24, 2014 at 1:22 PM, Saliya Ekanayake <esal...@gmail.com> wrote: > Thank you Ralph. I'll get back to you if I run into issues. > > > On Mon, Feb 24, 2014 at 12:23 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Feb 24, 2014, at 7:55 AM, Saliya Ekanayake <esal...@gmail.com> wrote: >> >> This is very interesting. I've been working on getting one of our >> clustering programs ( >> http://grids.ucs.indiana.edu/ptliupages/publications/DAVS_IEEE.pdf) to >> work with OpenMPI Java binding and we obtained very good speedup and >> scalability when run on HPC clusters with Infiniband. We are working on a >> report with performance results and will make it available here soon. >> >> >> Great! Will look forward to seeing it. >> >> >> This is again interesting as we have a series of MapReduce applications >> that we have developed in analyzing gene sequences ( >> http://grids.ucs.indiana.edu/ptliupages/publications/DACIDR_camera_ready_v0.3.pdf), >> which could benefit from having MPI support. Also, as you have mentioned, >> we run all these MapReduce jobs on HPC clusters. >> >> >> The folks at TACC are doing the Intel beta on a mouse genome, and will >> also be publishing their results comparing Hadoop performance under >> YARN/HDFS vs Slurm/Lustre. >> >> >> I am very eager to try 4.) and wonder if you could kindly provide some >> pointers on how to get it working. >> >> >> The current release contains the initial "staged" execution support, but >> not the dynamic extension I described. To use staged execution, all you >> have to do is: >> >> (a) express your mapper and reducer stages as separate app_contexts on >> the command line; and >> >> (b) add --staged to the cmd line to request staged execution. >> >> So it looks something like this: >> >> mpirun --staged -n 10 ./mapper; -n 4 ./reducer >> >> Depending on the allocation, mpirun will stage execution of the mappers >> and reducers, connecting the stdout of the first to the stdin of the >> second. There is also support for localized file systems (see the >> orte/mca/dfs framework) that allows you to transparently access/move data >> across the network, and of course mpirun supports pre-positioning of files >> via the --preload-files option. >> >> HTH - feel free to ask questions and we'll be happy to help. Also, if you >> want to collaborate on the dynamic extension, we'd welcome the assist. Both >> Jeff and I have been somewhat swamped with other priorities and so progress >> on that last step is lagging. >> >> Ralph >> >> >> Thank you, >> Saliya >> >> >> >> On Mon, Feb 24, 2014 at 10:30 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> >>> On Feb 23, 2014, at 10:42 AM, Saliya Ekanayake <esal...@gmail.com> >>> wrote: >>> >>> Hi, >>> >>> This is to get some info on the subject and not directly a question on >>> OpenMPI. >>> >>> I've Jeff's blog post on integrating OpenMPI with Hadoop ( >>> http://blogs.cisco.com/performance/resurrecting-mpi-and-java/) and >>> wanted to check if this is related with the Jira at >>> https://issues.apache.org/jira/browse/MAPREDUCE-2911 >>> >>> >>> Somewhat. A little history might help. I was asked a couple of years ago >>> to work on integrating MPI support with Hadoop. At that time, the thought >>> of those asking for my help was that we would enable YARN to support MPI, >>> which was captured in 2911. However, after working on it for a few months, >>> it became apparent to me that this was a mistake. YARN's architecture makes >>> support of MPI very difficult (but achievable - I did it with OMPI, and >>> someone else has now done it with MPICH), and the result exhibits horrible >>> scaling and relatively poor performance by HPC standards. So if you want to >>> run a very small MPI job under YARN, you can do it with a custom >>> application manager and JNI wrappers around every MPI call - just don't >>> expect great performance. >>> >>> What I did instead was to pivot direction and focus on porting Hadoop to >>> the HPC environment. Thought here was that, if we could get the Hadoop >>> classes working with a regular HPC environment, then all the HPC world's >>> tools and programming models become available. This is what we have done, >>> and it comes in four parts: >>> >>> 1. Java MPI bindings that are very close to C-level performance. These >>> are being released in the 1.7 series of OMPI and are unique to OMPI at this >>> time. Jose Roman and Oscar Vega continue to close the performance gap. >>> >>> 2. Integration to HPC resource managers such as Slurm and Moab. Intel >>> has taken the lead there and announced that support at SC13 - in beta test >>> now >>> >>> 3. Integration to HPC file systems such as Lustre. Intel again took the >>> lead here and has a Lustre adaptor in beta test >>> >>> 4. Equivalent of an application manager to stage map-reduce executions. >>> I updated OMPI's "mpirun" to handle that - available in the current 1.7 >>> release series. It fully understands "staged" execution and also notifies >>> the associated processes when MPI is feasible (i.e., all the procs in >>> comm_world are running). >>> >>> We continue to improve the Hadoop support - Cisco and I are >>> collaborating on a new "dynamic MPI" capability that will allow the procs >>> to interact without imposing the barrier at MPI_Init, for example. So I >>> expect that this summer will demonstrate a pretty robust capability in that >>> area. >>> >>> After all, there is no reason you shouldn't be able to run Hadoop on an >>> HPC cluster :-) >>> >>> HTH >>> Ralph >>> >>> >>> Also, is there a place I can get more info on this effort? >>> >>> Thank you, >>> Saliya >>> >>> -- >>> Saliya Ekanayake esal...@gmail.com >>> Cell 812-391-4914 Home 812-961-6383 >>> http://saliya.org >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Saliya Ekanayake esal...@gmail.com >> Cell 812-391-4914 Home 812-961-6383 >> http://saliya.org >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Saliya Ekanayake esal...@gmail.com > Cell 812-391-4914 Home 812-961-6383 > http://saliya.org > -- Saliya Ekanayake esal...@gmail.com Cell 812-391-4914 Home 812-961-6383 http://saliya.org