Hello, I would just like to confirm, when does the Combiner run(since it might not be run at all,see below). I read somewhere that it is run, if there is at least one reduce (which in my case i can be sure of). I also read, that the combiner is an optimization. However, it is also a chance for a function to transform the key/value (keeping the class the same i.e the combiner semantics are not changed) and deal with a smaller set ( this could be done in the reducer but the number of values for a key might be relatively large).
However, I guess it would be a mistake for reducer to expect its input coming from a combiner? E.g if there are only 10 value corresponding to a key(as outputted by the mapper), will these 10 values go straight to the reducer or to the reducer via the combiner? Here I am assuming my reduce operations does not need all the values for a key to work(so that a combiner can be used) i.e additive operations. Thank you Saptarshi On Sun, Nov 16, 2008 at 6:18 PM, Owen O'Malley <[email protected]> wrote: > The Combiner may be called 0, 1, or many times on each key between the > mapper and reducer. Combiners are just an application specific optimization > that compress the intermediate output. They should not have side effects or > transform the types. Unfortunately, since there isn't a separate interface > for Combiners, there is isn't a great place to document this requirement. > I've just filed HADOOP-4668 to improve the documentation. -- Saptarshi Guha - [email protected]
