If I have written a WordCount.java job in this manner:
conf.setMapperClass(Map.class);
conf.setCombinerClass(Combine.class);
conf.setReducerClass(Reduce.class);
So, you can see that three classes are being used here. I have
packaged these classes into a jar file called wc.jar and I run it like
this:
$ bin/hadoop jar wc.jar WordCountJob
1) I want to know when the job runs in a 5 machine cluster, is the
whole JAR file distributed across the 5 machines or the individual
class files are distributed individually?
2) Also, let us say the number of reducers are 2 while the number of
mappers are 5. What happens in this case? How are the class files or
jar files distributed?
3) Are they distributed via RPC or HTTP?