Re: A pluggable external sort for Hadoop MR

2011-04-26 Thread Asokan, M
Thanks Jeff. I filed HADOOP-7242(https://issues.apache.org/jira/browse/HADOOP-7242) for this enhancement. -- Asokan On 04/26/2011 02:00 PM, Jeff Hammerbacher wrote: Hey Asokan, Could you please file a JIRA with your proposed enhancement so that the discussion can be archived there? See http:

Re: A pluggable external sort for Hadoop MR

2011-04-26 Thread Jeff Hammerbacher
Hey Asokan, Could you please file a JIRA with your proposed enhancement so that the discussion can be archived there? See http://wiki.apache.org/hadoop/HowToContribute for more details on how to contribute to Hadoop. Thanks, Jeff On Tue, Apr 26, 2011 at 9:46 AM, Asokan, M wrote: > Hi Chris, >

Re: A pluggable external sort for Hadoop MR

2011-04-26 Thread Asokan, M
Hi Chris, The overall elapsed time to run a sort depends on many factors other than the sort algorithm. If you follow the data flow in MR from the point where sorting starts in Map phase to the point where pairs are available for reduction in Reduce phase there are CPU and IO intensive acti

Re: A pluggable external sort for Hadoop MR

2011-04-26 Thread Christopher Smith
Aren't you worried that the overhead of shoving all that data through an external sort facility would outweigh any benefits from the algo? --Chris On Apr 26, 2011, at 8:34 AM, "Asokan, M" wrote: > Hi All, > > I am submitting this notice of intent to contribute to the Hadoop community > on be

A pluggable external sort for Hadoop MR

2011-04-26 Thread Asokan, M
Hi All, I am submitting this notice of intent to contribute to the Hadoop community on behalf of Syncsort, Inc. (www.syncsort.com) an interface for an external sorter. Although Hadoop MR (Map/Reduce) provides users with pluggable InputFormat, Mapper, Partitioner, Combi