Hi all, I have been using the Hadoop Fair Scheduler for some experiments on a 100 node cluster with 2 map slots per node (hence, a total of 200 map slots).
In one of my experiments, all the map tasks finish within a heartbeat interval of 3 seconds. I noticed that the maximum number of concurrently active map slots on my cluster never exceeds 100, and hence, the cluster utilization during my experiments never exceeds 50% even when large jobs with more than a 1000 maps are being executed. A look at the Fair Scheduler code (in particular, the assignTasks function) revealed the reason. As per my understanding, with the implementation in Hadoop 0.20.0, a TaskTracker is not assigned more than 1 map and 1 reduce task per heart beat. In my experiments, in every heart beat, each TT has 2 free map slots but is assigned only 1 map task, and hence, the utilization never goes beyond 50%. Of course, this (degenerate) case does not arise when map tasks take more than one 1 heart beat interval to finish. For example, I repeated the experiments with maps tasks taking close to 15 s to finish and noticed close to 100 % utilization when large jobs were executing. Why does the Fair Scheduler not assign more than one map task to a TT per heart beat? Is this done to spread the load uniformly across the cluster? I looked at assignTasks function in the default Hadoop scheduler (JobQueueTaskScheduler.java), and it does assign more than 1 map task per heart beat to a TT. It will be easy to change the Fair Scheduler to assign more than 1 map task to a TT per heart beat (I did that and achieved 100% utilization even with small map tasks). But I am wondering, if doing so will violate some fairness properties. Thanks, Abhishek