Re: Need help understanding the source

2009-07-07 Thread jason hadoop
When you have 0 reduces, the map outputs themselves are moved to the output directory for you. It is also straight forward to open your own file and write to it directory instead of using the output collector. On Tue, Jul 7, 2009 at 10:14 AM, Todd Lipcon wrote: > On Tue, Jul 7, 2009 at 1:13 AM,

Re: Need help understanding the source

2009-07-07 Thread Todd Lipcon
On Tue, Jul 7, 2009 at 1:13 AM, jason hadoop wrote: > > > The other alternative you may try is simply to write your map outputs to > HDFS [ie: setNumReduces(0)], and have a consumer pick up the map outputs as > they appear. If the life of the files is short and you can withstand data > loss, you m

Re: Need help understanding the source

2009-07-07 Thread jason hadoop
If your constraints are loose enough, you could consider using the chain mapping that became available in 19, and have multiple mappers for your jobs. The extra mappers only receive the output of the prior map in the chain and if I remember correctly, the combiner is run at the end of the chain of

Re: Need help understanding the source

2009-07-07 Thread Amr Awadallah
To add to Todd/Ted's wise words, the Hadoop (and MapReduce) architects didn't impose this limitation just for fun, it is very core to enabling Hadoop to be as reliable as it is. If the reducer starts processing mapper output immediately and a specific mapper fails then the reducer would have to

Re: Need help understanding the source

2009-07-06 Thread Ted Dunning
I would consider this to be a very delicate optimization with little utility in the real world. It is very, very rare to reliably know how many records the reducer will see. Getting this wrong would be a disaster. Getting it right would be very difficult in almost all cases. Moreover, this assu

Re: Need help understanding the source

2009-07-06 Thread Naresh Rapolu
Hello Todd, My aim is to make the reduce move ahead with reduction as and when it gets the data required, instead of waiting for all the maps to complete. If it knows how many records it needs and compares it with number of records it has got until now, it can move on once they become equal wit

Re: Need help understanding the source

2009-07-06 Thread Naresh Rapolu
Hello Todd, My aim is to make the reduce move ahead with reduction as and when it gets the data required, instead of waiting for all the maps to complete. If it knows how many records it needs and compares it with number of records it has got until now, it can move on once they become equal wit

Re: Need help understanding the source

2009-07-06 Thread Todd Lipcon
Hi Naresh, You may be better off rephrasing your question at a higher level. What exactly are you trying to accomplish? The code you're citing is very "internal" and not meant to be touched by user-level code. -Todd On Sun, Jul 5, 2009 at 11:13 AM, Naresh Rapolu wrote: > > Hello, > > In Reduc