Hi, On Mon, Nov 1, 2010 at 8:19 AM, Zhenhua Guo <jen...@gmail.com> wrote: > Thanks! > One more question. Is the input file replicated on each node where a > mapper is run? Or just the portion processed by a mapper is > transferred?
With the use of HDFS, this is what happens: Mappers are run on nodes where the input file's blocks are already present [Data-local map tasks]. If TaskTracker slots are unavailable on that node for the mapper to run, it is run somewhere else and the input block ("portion processed by a mapper") is fetched from one of the DataNodes in the same rack [Rack-local map tasks]. -- Harsh J www.harshj.com