[ https://issues.apache.org/jira/browse/HIVE-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V resolved HIVE-6247. --------------------------- Resolution: Not A Problem > select count(distinct) should be MRR in Tez > ------------------------------------------- > > Key: HIVE-6247 > URL: https://issues.apache.org/jira/browse/HIVE-6247 > Project: Hive > Issue Type: Bug > Components: Tez > Affects Versions: 0.13.0 > Reporter: Gopal V > Assignee: Gunther Hagleitner > > The MR query plan for "select count(distinct) " fires off multiple reducers, > with a local work task to perform final aggregation. > The Tez version fires off exactly 1 reducer for the entire data-set which > chokes and dies/slows down massively. > To reproduce on a TPC-DS database (meaningless query) > {code} > select count(distinct ss_net_profit) from store_sales ss join store s on > ss.ss_store_sk = s.s_store_sk; > {code} > This spins up Map 1, Map 2 (for the dim table + fact table) & Reducer 1 which > is always "0/1". -- This message was sent by Atlassian JIRA (v6.3.4#6332)