Sergey Shelukhin created HIVE-11265: ---------------------------------------
Summary: LLAP: investigate locality issues Key: HIVE-11265 URL: https://issues.apache.org/jira/browse/HIVE-11265 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth Running q27 with split-waves 0.9 on 10 nodes x 16 executors, I get 140 mappers reading store_sales, and 5~ more assorted vertices. When running the query repeatedly, one would expect good locality, i.e. the same stripes being processed on the same nodes most of the time. However, this is only the case for 40-50% of the stripes in my experience. When the query is run 10 times in a row, an average split (file+stripe) is read on ~4 machine. Some are actually read on a different machine every run :) This affects cache hit ratio. Understandably in real scenarios we won't get 100% locality, but we should not be getting bad locality in simple cases like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)