Thank you Ralph. I'll get back to you if I run into issues.
On Mon, Feb 24, 2014 at 12:23 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Feb 24, 2014, at 7:55 AM, Saliya Ekanayake <esal...@gmail.com> wrote: > > This is very interesting. I've been working on getting one of our > clustering programs ( > http://grids.ucs.indiana.edu/ptliupages/publications/DAVS_IEEE.pdf) to > work with OpenMPI Java binding and we obtained very good speedup and > scalability when run on HPC clusters with Infiniband. We are working on a > report with performance results and will make it available here soon. > > > Great! Will look forward to seeing it. > > > This is again interesting as we have a series of MapReduce applications > that we have developed in analyzing gene sequences ( > http://grids.ucs.indiana.edu/ptliupages/publications/DACIDR_camera_ready_v0.3.pdf), > which could benefit from having MPI support. Also, as you have mentioned, > we run all these MapReduce jobs on HPC clusters. > > > The folks at TACC are doing the Intel beta on a mouse genome, and will > also be publishing their results comparing Hadoop performance under > YARN/HDFS vs Slurm/Lustre. > > > I am very eager to try 4.) and wonder if you could kindly provide some > pointers on how to get it working. > > > The current release contains the initial "staged" execution support, but > not the dynamic extension I described. To use staged execution, all you > have to do is: > > (a) express your mapper and reducer stages as separate app_contexts on the > command line; and > > (b) add --staged to the cmd line to request staged execution. > > So it looks something like this: > > mpirun --staged -n 10 ./mapper; -n 4 ./reducer > > Depending on the allocation, mpirun will stage execution of the mappers > and reducers, connecting the stdout of the first to the stdin of the > second. There is also support for localized file systems (see the > orte/mca/dfs framework) that allows you to transparently access/move data > across the network, and of course mpirun supports pre-positioning of files > via the --preload-files option. > > HTH - feel free to ask questions and we'll be happy to help. Also, if you > want to collaborate on the dynamic extension, we'd welcome the assist. Both > Jeff and I have been somewhat swamped with other priorities and so progress > on that last step is lagging. > > Ralph > > > Thank you, > Saliya > > > > On Mon, Feb 24, 2014 at 10:30 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Feb 23, 2014, at 10:42 AM, Saliya Ekanayake <esal...@gmail.com> wrote: >> >> Hi, >> >> This is to get some info on the subject and not directly a question on >> OpenMPI. >> >> I've Jeff's blog post on integrating OpenMPI with Hadoop ( >> http://blogs.cisco.com/performance/resurrecting-mpi-and-java/) and >> wanted to check if this is related with the Jira at >> https://issues.apache.org/jira/browse/MAPREDUCE-2911 >> >> >> Somewhat. A little history might help. I was asked a couple of years ago >> to work on integrating MPI support with Hadoop. At that time, the thought >> of those asking for my help was that we would enable YARN to support MPI, >> which was captured in 2911. However, after working on it for a few months, >> it became apparent to me that this was a mistake. YARN's architecture makes >> support of MPI very difficult (but achievable - I did it with OMPI, and >> someone else has now done it with MPICH), and the result exhibits horrible >> scaling and relatively poor performance by HPC standards. So if you want to >> run a very small MPI job under YARN, you can do it with a custom >> application manager and JNI wrappers around every MPI call - just don't >> expect great performance. >> >> What I did instead was to pivot direction and focus on porting Hadoop to >> the HPC environment. Thought here was that, if we could get the Hadoop >> classes working with a regular HPC environment, then all the HPC world's >> tools and programming models become available. This is what we have done, >> and it comes in four parts: >> >> 1. Java MPI bindings that are very close to C-level performance. These >> are being released in the 1.7 series of OMPI and are unique to OMPI at this >> time. Jose Roman and Oscar Vega continue to close the performance gap. >> >> 2. Integration to HPC resource managers such as Slurm and Moab. Intel has >> taken the lead there and announced that support at SC13 - in beta test now >> >> 3. Integration to HPC file systems such as Lustre. Intel again took the >> lead here and has a Lustre adaptor in beta test >> >> 4. Equivalent of an application manager to stage map-reduce executions. I >> updated OMPI's "mpirun" to handle that - available in the current 1.7 >> release series. It fully understands "staged" execution and also notifies >> the associated processes when MPI is feasible (i.e., all the procs in >> comm_world are running). >> >> We continue to improve the Hadoop support - Cisco and I are collaborating >> on a new "dynamic MPI" capability that will allow the procs to interact >> without imposing the barrier at MPI_Init, for example. So I expect that >> this summer will demonstrate a pretty robust capability in that area. >> >> After all, there is no reason you shouldn't be able to run Hadoop on an >> HPC cluster :-) >> >> HTH >> Ralph >> >> >> Also, is there a place I can get more info on this effort? >> >> Thank you, >> Saliya >> >> -- >> Saliya Ekanayake esal...@gmail.com >> Cell 812-391-4914 Home 812-961-6383 >> http://saliya.org >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Saliya Ekanayake esal...@gmail.com > Cell 812-391-4914 Home 812-961-6383 > http://saliya.org > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Saliya Ekanayake esal...@gmail.com Cell 812-391-4914 Home 812-961-6383 http://saliya.org