Hi all, I have a 6 node cluster, and on a simple query created with a table from a CSV, I was seeing a lot of mappers reporting that they were not using data locality. I changed the replication factor to 6 but still MR is showing only about 60% data locality in the data-local map tasks.
How can this be when I have no under replicated blocks, and replication count the same as the machine count? Am I missing something? Does it indicate that something is wrong in the MR configuration (E.g. A TT not recognizing localhost for DN for example)? The 6 machines each have 12 spindles in them and I'm running Hive 0.7 and 0.9 trunk built about 2 weeks ago. Many thanks! Tim