[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vineet Garg updated HIVE-21382: ------------------------------- Status: Open (was: Patch Available) > Group by keys reduction optimization - keys are not reduced in query23 > ---------------------------------------------------------------------- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement > Reporter: Vineet Garg > Assignee: Vineet Garg > Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, > HIVE-21382.5.patch, HIVE-21382.6.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales > from catalog_sales > ,date_dim > where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)