FileInnputFormat, FileSplit, and LineRecorder: where are they run?

Saptarshi Guha Thu, 05 Feb 2009 13:25:04 -0800

Hello All,
In order to get a better understanding of Hadoop, i've started reading
the source and have a question
The FileInputFormat, reads in files, splits into splitsizes (which may
be bigger than block size) and creates FileSplits.
The FileSplits contain the start, length *and* the locations of the split.
The LineRecordReader, receives a split and emits records.


So far I think i'm correct(hopefully). Now, my questions
Does the LineRecordReader run on a machine, in some sense, closest to
the location of the splits? i.e
Q1: If the split is less than the block size, then the split is
located on one machine (apart from replicates): does the
LineRecordReader run on the machine which contains the split? Or at
least attempt to?
Q2. If a split is greater than the  block size, it spans multiple
blocks which could reside on more than 1 machine. In this case, on
which machine does the LineRecordReader run? The machine 'closest' to
them?

Please correct me if i'm wrong.
Thank you
Saptarshi


-- 
Saptarshi Guha - [email protected]

FileInnputFormat, FileSplit, and LineRecorder: where are they run?

Reply via email to