Re: Need help understanding the source

2009-07-07 Thread jason hadoop
When you have 0 reduces, the map outputs themselves are moved to the output directory for you. It is also straight forward to open your own file and write to it directory instead of using the output collector. On Tue, Jul 7, 2009 at 10:14 AM, Todd Lipcon wrote: > On Tue, Jul 7, 2009 at 1:13 AM,

Re: Need help understanding the source

2009-07-07 Thread Todd Lipcon
On Tue, Jul 7, 2009 at 1:13 AM, jason hadoop wrote: > > > The other alternative you may try is simply to write your map outputs to > HDFS [ie: setNumReduces(0)], and have a consumer pick up the map outputs as > they appear. If the life of the files is short and you can withstand data > loss, you m

Re: Need help understanding the source

2009-07-07 Thread jason hadoop
If your constraints are loose enough, you could consider using the chain mapping that became available in 19, and have multiple mappers for your jobs. The extra mappers only receive the output of the prior map in the chain and if I remember correctly, the combiner is run at the end of the chain of

Re: Need help understanding the source

2009-07-07 Thread Amr Awadallah
To add to Todd/Ted's wise words, the Hadoop (and MapReduce) architects didn't impose this limitation just for fun, it is very core to enabling Hadoop to be as reliable as it is. If the reducer starts processing mapper output immediately and a specific mapper fails then the reducer would have to

Re: Need help understanding the source

2009-07-06 Thread Ted Dunning
I would consider this to be a very delicate optimization with little utility in the real world. It is very, very rare to reliably know how many records the reducer will see. Getting this wrong would be a disaster. Getting it right would be very difficult in almost all cases. Moreover, this assu

Re: Need help understanding the source

2009-07-06 Thread Naresh Rapolu
; Naresh Rapolu. > -- View this message in context: http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24360429.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Re: Need help understanding the source

2009-07-06 Thread Naresh Rapolu
research.. and if everything works.. i can come up with a contrib file for the same. Thanks, Naresh Rapolu. -- View this message in context: http://www.nabble.com/Re%3A-Need-help-understanding-the-source-tp24360327p24360327.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Re: Need help understanding the source

2009-07-06 Thread Todd Lipcon
> length, but can any one let me know how should i subtract them to get the > aggregate size of map-output-records. > > Thanks, > Naresh Rapolu. > -- > View this message in context: > http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24345474.html > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > >

Need help understanding the source

2009-07-05 Thread Naresh Rapolu
them to get the aggregate size of map-output-records. Thanks, Naresh Rapolu. -- View this message in context: http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24345474.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.