This would seem like a perfect use case for YARN. Is that what you are thinking? You could implement this as a new framework rather then trying to incrementally change map-reduce.
This would let you move faster and demonstrate side by side performance improvements. --- E14 - typing on glass On May 8, 2012, at 10:32 AM, Sriram Rao <srirams...@gmail.com> wrote: > Hi, > > I'd like to announce the release of a new open source project, Sailfish. > > http://code.google.com/p/sailfish/ > > Sailfish tries to improve Hadoop-performance, particularly for large-jobs > which process TB's of data and run for hours. In building Sailfish, we > modify how map-output is handled and transported from map->reduce. > > The project pages provide more information about the project. > > We are looking for colloborators who can help get some of the ideas into > Apache Hadoop. A possible step forward could be to make "shuffle" phase of > Hadoop pluggable. > > If you are interested in working with us, please get in touch with me. > > Sriram