Hi all,

I have a 6 node cluster, and on a simple query created with a table from a
CSV, I was seeing a lot of mappers reporting that they were not using data
locality.
I changed the replication factor to 6 but still MR is showing only about
60% data locality in the data-local map tasks.

How can this be when I have no under replicated blocks, and replication
count the same as the machine count?  Am I missing something?  Does it
indicate that something is wrong in the MR configuration (E.g. A TT not
recognizing localhost for DN for example)?

The 6 machines each have 12 spindles in them and I'm running Hive 0.7 and
0.9 trunk built about 2 weeks ago.

Many thanks!
Tim

Reply via email to