On Tue, Jul 7, 2009 at 1:13 AM, jason hadoop <jason.had...@gmail.com> wrote: > > > The other alternative you may try is simply to write your map outputs to > HDFS [ie: setNumReduces(0)], and have a consumer pick up the map outputs as > they appear. If the life of the files is short and you can withstand data > loss, you may turn down the replication factor, to speed the writes. >
I'm not sure that would be very easy, since the output is initially written into a temporary directory. I suppose you could go digging through the temporary directory to catch the map outputs as they finish, but it's probably tricky at best and certainly not intended -Todd > On Tue, Jul 7, 2009 at 12:30 AM, Amr Awadallah <a...@cloudera.com> wrote: > > > To add to Todd/Ted's wise words, the Hadoop (and MapReduce) architects > > didn't impose this limitation just for fun, it is very core to enabling > > Hadoop to be as reliable as it is. If the reducer starts processing > mapper > > output immediately and a specific mapper fails then the reducer would > have > > to know how to undo the specific pieces of work related to the failed > > mapper, not trivial at all. That said, the combiners do achieve a bit of > > that for you, as they start working immediately on the map out, but on a > > per-mapper basis (not global), so easy to handle failure in that case > (you > > just redo that mapper and the combining for it). > > > > -- amr > > > > > > Ted Dunning wrote: > > > >> I would consider this to be a very delicate optimization with little > >> utility > >> in the real world. It is very, very rare to reliably know how many > >> records > >> the reducer will see. Getting this wrong would be a disaster. Getting > it > >> right would be very difficult in almost all cases. > >> > >> Moreover, this assumption is baked all through the map-reduce design and > >> thus doing a change to allow reduce to go ahead is likely to be really > >> tricky (not that I know this for a fact). > >> > >> > >> On Mon, Jul 6, 2009 at 11:14 AM, Naresh Rapolu < > >> nareshreddy.rap...@gmail.com > >> > >> > >>> wrote: > >>> > >>> > >> > >> > >> > >>> My aim is to make the reduce move ahead with reduction as and when it > >>> gets > >>> the data required, instead of waiting for all the maps to complete. If > >>> it > >>> knows how many records it needs and compares it with number of records > it > >>> has got until now, it can move on once they become equal without > waiting > >>> for all the maps to finish. > >>> > >>> > >>> > >> > >> > >> > > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.amazon.com/dp/1430219424?tag=jewlerymall > www.prohadoopbook.com a community for Hadoop Professionals >