Suggestions of proper usage of "key" parameter ?

Ricky Ho Sun, 14 Dec 2008 16:48:05 -0800

While the "key" EMITTED FROM the map() function and the "key" INPUT INTO the 
reduce() function has a clear meaning to me, the opposite is not so clear to me.


Yes, I am referring to the "key" INPUT INTO the map() function and the "key" 
EMITTED FROM the reduce() function.  Can someone explain why do we need a "key" 
in these cases and what is the proper use of it ?


Map Phase
==========
Why do we need a "key" parameter in the map() function ?  Why isn't it possible 
to store all information stored in just the "value" parameter ?

Who determines what the "key" should be ?  (by the corresponding "InputFormat" 
implementation class) ?

E.g.  If I am using the "TextInputFormat" and "FileInputFormat" :
        conf.setInputFormat(TextInputFormat.class);
        FileInputFormat.setInputPaths(conf, filename);

In this case, what is the key in the map() call ?  (name of the input file) ?


Reduce Phase
=============
Why do we need the reduce() function to emit a <key, value> rather than just 
the value ?

What if the reduce() function emits multiple <key, value> entries or not 
emitting any entry at all ?  Is this considered OK ?

What if the reduce() function emits a <key, value> entry whose key is not the 
same as the input key parameter to the reduce() function ?  Is this OK ?

If there is a two Map/Reduce cycle chained together.  Is the "key" input into 
the 2nd round map() function determined by the "key" emitted from the 1st round 
reduce() function ?


Rgds,
Ricky

Suggestions of proper usage of "key" parameter ?

Reply via email to