While the "key" EMITTED FROM the map() function and the "key" INPUT INTO the
reduce() function has a clear meaning to me, the opposite is not so clear to me.
Yes, I am referring to the "key" INPUT INTO the map() function and the "key"
EMITTED FROM the reduce() function. Can someone explain why do we need a "key"
in these cases and what is the proper use of it ?
Map Phase
==========
Why do we need a "key" parameter in the map() function ? Why isn't it possible
to store all information stored in just the "value" parameter ?
Who determines what the "key" should be ? (by the corresponding "InputFormat"
implementation class) ?
E.g. If I am using the "TextInputFormat" and "FileInputFormat" :
conf.setInputFormat(TextInputFormat.class);
FileInputFormat.setInputPaths(conf, filename);
In this case, what is the key in the map() call ? (name of the input file) ?
Reduce Phase
=============
Why do we need the reduce() function to emit a <key, value> rather than just
the value ?
What if the reduce() function emits multiple <key, value> entries or not
emitting any entry at all ? Is this considered OK ?
What if the reduce() function emits a <key, value> entry whose key is not the
same as the input key parameter to the reduce() function ? Is this OK ?
If there is a two Map/Reduce cycle chained together. Is the "key" input into
the 2nd round map() function determined by the "key" emitted from the 1st round
reduce() function ?
Rgds,
Ricky