If I have written a WordCount.java job in this manner:

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Combine.class);
        conf.setReducerClass(Reduce.class);

So, you can see that three classes are being used here.  I have
packaged these classes into a jar file called wc.jar and I run it like
this:

$ bin/hadoop jar wc.jar WordCountJob

1) I want to know when the job runs in a 5 machine cluster, is the
whole JAR file distributed across the 5 machines or the individual
class files are distributed individually?

2) Also, let us say the number of reducers are 2 while the number of
mappers are 5. What happens in this case? How are the class files or
jar files distributed?

3) Are they distributed via RPC or HTTP?

Reply via email to