[ https://issues.apache.org/jira/browse/HIVE-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated HIVE-17474: ------------------------------------ Attachment: explain.70.after.analyze explain.70.before.analyze > Poor Performance about subquery like DS/query70 on HoS > ------------------------------------------------------ > > Key: HIVE-17474 > URL: https://issues.apache.org/jira/browse/HIVE-17474 > Project: Hive > Issue Type: Bug > Reporter: liyunzhang_intel > Attachments: explain.70.after.analyze, explain.70.before.analyze, > explain.70.vec > > > in > [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql]. > {code} > select > sum(ss_net_profit) as total_sum > ,s_state > ,s_county > ,grouping__id as lochierarchy > , rank() over(partition by grouping__id, case when grouping__id == 2 then > s_state end order by sum(ss_net_profit)) as rank_within_parent > from > store_sales ss join date_dim d1 on d1.d_date_sk = ss.ss_sold_date_sk > join store s on s.s_store_sk = ss.ss_store_sk > where > d1.d_month_seq between 1193 and 1193+11 > and s.s_state in > ( select s_state > from (select s_state as s_state, sum(ss_net_profit), > rank() over ( partition by s_state order by > sum(ss_net_profit) desc) as ranking > from store_sales, store, date_dim > where d_month_seq between 1193 and 1193+11 > and date_dim.d_date_sk = > store_sales.ss_sold_date_sk > and store.s_store_sk = store_sales.ss_store_sk > group by s_state > ) tmp1 > where ranking <= 5 > ) > group by s_state,s_county with rollup > order by > lochierarchy desc > ,case when lochierarchy = 0 then s_state end > ,rank_within_parent > limit 100; > {code} > let's analyze the query, > part1: it calculates the sub-query and get the result of the state which > ss_net_profit is less than 5. > part2: big table store_sales join small tables date_dim, store and get the > result. > part3: part1 join part2 > the problem is on the part3, this is common join. The cardinality of part1 > and part2 is low as there are not very different values about states( > actually there are 30 different values in the table store). If use common > join, big data will go to the 30 reducers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)