When you have 0 reduces, the map outputs themselves are moved to the output
directory for you.
It is also straight forward to open your own file and write to it directory
instead of using the output collector.
On Tue, Jul 7, 2009 at 10:14 AM, Todd Lipcon wrote:
> On Tue, Jul 7, 2009 at 1:13 AM,
On Tue, Jul 7, 2009 at 1:13 AM, jason hadoop wrote:
>
>
> The other alternative you may try is simply to write your map outputs to
> HDFS [ie: setNumReduces(0)], and have a consumer pick up the map outputs as
> they appear. If the life of the files is short and you can withstand data
> loss, you m
If your constraints are loose enough, you could consider using the chain
mapping that became available in 19, and
have multiple mappers for your jobs.
The extra mappers only receive the output of the prior map in the chain and
if I remember correctly, the combiner is run at the end of the chain of
To add to Todd/Ted's wise words, the Hadoop (and MapReduce) architects
didn't impose this limitation just for fun, it is very core to enabling
Hadoop to be as reliable as it is. If the reducer starts processing
mapper output immediately and a specific mapper fails then the reducer
would have to
I would consider this to be a very delicate optimization with little utility
in the real world. It is very, very rare to reliably know how many records
the reducer will see. Getting this wrong would be a disaster. Getting it
right would be very difficult in almost all cases.
Moreover, this assu
; Naresh Rapolu.
>
--
View this message in context:
http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24360429.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.
research.. and if
everything works.. i can come up with a contrib file for the same.
Thanks,
Naresh Rapolu.
--
View this message in context:
http://www.nabble.com/Re%3A-Need-help-understanding-the-source-tp24360327p24360327.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.
> length, but can any one let me know how should i subtract them to get the
> aggregate size of map-output-records.
>
> Thanks,
> Naresh Rapolu.
> --
> View this message in context:
> http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24345474.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
>
>
them to get the
aggregate size of map-output-records.
Thanks,
Naresh Rapolu.
--
View this message in context:
http://www.nabble.com/Need-help-understanding-the-source-tp24345474p24345474.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.