[ https://issues.apache.org/jira/browse/HIVE-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138111#comment-17138111 ]
Krisztian Kasa commented on HIVE-23493: --------------------------------------- Pushed to master. Thank you [~jcamachorodriguez] for review. > Rewrite plan to join back tables with many projected columns joined multiple > times > ---------------------------------------------------------------------------------- > > Key: HIVE-23493 > URL: https://issues.apache.org/jira/browse/HIVE-23493 > Project: Hive > Issue Type: New Feature > Components: CBO > Reporter: Krisztian Kasa > Assignee: Krisztian Kasa > Priority: Major > Labels: pull-request-available > Attachments: HIVE-23493.1.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Queries with a pattern where one or more tables joins with a fact table in a > CTE. Many columns are projected out those tables and then grouped in the CTE. > The main query joins multiple instances of the CTE and may project a subset > of these. > The optimization is to rewrite the CTE to include only key (PK, non null > Unique Key) columns and join the tables back to the resultset of the main > query to fetch the rest of the wide columns. This reduces the datasize of the > joined back tables that is broadcast/shuffled throughout the DAG processing. > Example query, tpc-ds query4 > {code} > with year_total as ( > select c_customer_id customer_id > ,c_first_name customer_first_name > ,c_last_name customer_last_name > ,c_preferred_cust_flag customer_preferred_cust_flag > ,c_birth_country customer_birth_country > ,c_login customer_login > ,c_email_address customer_email_address > ,d_year dyear > > ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) > year_total > ,'s' sale_type > from customer > ,store_sales > ,date_dim > where c_customer_sk = ss_customer_sk > and ss_sold_date_sk = d_date_sk > group by c_customer_id > ,c_first_name > ,c_last_name > ,c_preferred_cust_flag > ,c_birth_country > ,c_login > ,c_email_address > ,d_year > union all > select c_customer_id customer_id > ,c_first_name customer_first_name > ,c_last_name customer_last_name > ,c_preferred_cust_flag customer_preferred_cust_flag > ,c_birth_country customer_birth_country > ,c_login customer_login > ,c_email_address customer_email_address > ,d_year dyear > > ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) > ) year_total > ,'c' sale_type > from customer > ,catalog_sales > ,date_dim > where c_customer_sk = cs_bill_customer_sk > and cs_sold_date_sk = d_date_sk > group by c_customer_id > ,c_first_name > ,c_last_name > ,c_preferred_cust_flag > ,c_birth_country > ,c_login > ,c_email_address > ,d_year > union all > select c_customer_id customer_id > ,c_first_name customer_first_name > ,c_last_name customer_last_name > ,c_preferred_cust_flag customer_preferred_cust_flag > ,c_birth_country customer_birth_country > ,c_login customer_login > ,c_email_address customer_email_address > ,d_year dyear > > ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) > ) year_total > ,'w' sale_type > from customer > ,web_sales > ,date_dim > where c_customer_sk = ws_bill_customer_sk > and ws_sold_date_sk = d_date_sk > group by c_customer_id > ,c_first_name > ,c_last_name > ,c_preferred_cust_flag > ,c_birth_country > ,c_login > ,c_email_address > ,d_year > ) > select > t_s_secyear.customer_id > ,t_s_secyear.customer_first_name > ,t_s_secyear.customer_last_name > ,t_s_secyear.customer_birth_country > from year_total t_s_firstyear > ,year_total t_s_secyear > ,year_total t_c_firstyear > ,year_total t_c_secyear > ,year_total t_w_firstyear > ,year_total t_w_secyear > where t_s_secyear.customer_id = t_s_firstyear.customer_id > and t_s_firstyear.customer_id = t_c_secyear.customer_id > and t_s_firstyear.customer_id = t_c_firstyear.customer_id > and t_s_firstyear.customer_id = t_w_firstyear.customer_id > and t_s_firstyear.customer_id = t_w_secyear.customer_id > and t_s_firstyear.sale_type = 's' > and t_c_firstyear.sale_type = 'c' > and t_w_firstyear.sale_type = 'w' > and t_s_secyear.sale_type = 's' > and t_c_secyear.sale_type = 'c' > and t_w_secyear.sale_type = 'w' > and t_s_firstyear.dyear = 1999 > and t_s_secyear.dyear = 1999+1 > and t_c_firstyear.dyear = 1999 > and t_c_secyear.dyear = 1999+1 > and t_w_firstyear.dyear = 1999 > and t_w_secyear.dyear = 1999+1 > and t_s_firstyear.year_total > 0 > and t_c_firstyear.year_total > 0 > and t_w_firstyear.year_total > 0 > and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / > t_c_firstyear.year_total else null end > > case when t_s_firstyear.year_total > 0 then > t_s_secyear.year_total / t_s_firstyear.year_total else null end > and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / > t_c_firstyear.year_total else null end > > case when t_w_firstyear.year_total > 0 then > t_w_secyear.year_total / t_w_firstyear.year_total else null end > order by t_s_secyear.customer_id > ,t_s_secyear.customer_first_name > ,t_s_secyear.customer_last_name > ,t_s_secyear.customer_birth_country > limit 100; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)