Krisztian Kasa created HIVE-22808: ------------------------------------- Summary: HiveRelFieldTrimmer does not handle HiveTableFunctionScan Key: HIVE-22808 URL: https://issues.apache.org/jira/browse/HIVE-22808 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Krisztian Kasa Assignee: Krisztian Kasa
*Repro* {code:java} CREATE TABLE table_16 ( timestamp_col_19 timestamp, timestamp_col_29 timestamp, int_col_27 int, int_col_39 int, boolean_col_18 boolean, varchar0045_col_23 varchar(45) ); CREATE TABLE table_7 ( int_col_10 int, bigint_col_3 bigint ); CREATE TABLE table_10 ( boolean_col_8 boolean, boolean_col_16 boolean, timestamp_col_5 timestamp, timestamp_col_15 timestamp, timestamp_col_30 timestamp, decimal3825_col_26 decimal(38, 25), smallint_col_9 smallint, int_col_18 int ); explain cbo SELECT DISTINCT COALESCE(a4.timestamp_col_15, IF(a4.boolean_col_16, a4.timestamp_col_30, a4.timestamp_col_5)) AS timestamp_col FROM table_7 a3 RIGHT JOIN table_10 a4 WHERE (a3.bigint_col_3) >= (a4.int_col_18) INTERSECT ALL SELECT COALESCE(LEAST( COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)), COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP)) ), GREATEST(COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS TIMESTAMP)), COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP))) ) AS timestamp_col FROM table_16 a1 GROUP BY COALESCE(LEAST( COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)), COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP)) ), GREATEST( COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS TIMESTAMP)), COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP))) ); {code} CBO Plan contains unnecessary columns or all columns from a table in projections like: {code:java} HiveProject(int_col_10=[$0], bigint_col_3=[$1], BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3], CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)]) {code} *Cause* The plan contains a HiveTableFunctionScan operator: {code:java} HiveTableFunctionScan(invocation=[replicate_rows($0, $1)], rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)]) {code} HiveTableFunctionScan is not handled by HiveRelFieldTrimmer nor RelFieldTrimmer which suppose to remove unused columns in the CalcitePlanner.applyPreJoinOrderingTransforms(...) phase. The whole subtree rooted from HiveTableFunctionScan is ignored. Whole plan: {code:java} CBO PLAN: HiveProject($f0=[$1]) HiveTableFunctionScan(invocation=[replicate_rows($0, $1)], rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)]) HiveProject($f0=[$2], $f1=[$0]) HiveFilter(condition=[=($1, 2)]) HiveAggregate(group=[{0}], agg#0=[count($1)], agg#1=[min($1)]) HiveProject($f0=[$0], $f1=[$1]) HiveUnion(all=[true]) HiveProject($f0=[$0], $f1=[$1]) HiveAggregate(group=[{0}], agg#0=[count()]) HiveProject($f0=[$0]) HiveAggregate(group=[{0}]) HiveProject($f0=[CASE(IS NOT NULL($7), $7, if($5, $8, $6))]) HiveJoin(condition=[>=($1, $13)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(int_col_10=[$0], bigint_col_3=[$1], BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3], CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)]) HiveFilter(condition=[IS NOT NULL($1)]) HiveTableScan(table=[[default, table_7]], table:alias=[a3]) HiveProject(boolean_col_16=[$0], timestamp_col_5=[$1], timestamp_col_15=[$2], timestamp_col_30=[$3], int_col_18=[$4], BLOCK__OFFSET__INSIDE__FILE=[$5], INPUT__FILE__NAME=[$6], ROW__ID=[$7], CAST=[CAST($4):BIGINT]) HiveFilter(condition=[IS NOT NULL(CAST($4):BIGINT)]) HiveTableScan(table=[[default, table_10]], table:alias=[a4]) HiveProject($f0=[$0], $f1=[$1]) HiveAggregate(group=[{0}], agg#0=[count()]) HiveProject($f0=[$0]) HiveAggregate(group=[{0}]) HiveProject($f0=[CASE(IS NOT NULL(least(CASE(IS NOT NULL($0), $0, 2010-03-29 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2014-08-16 00:00:00:TIMESTAMP(9)))), least(CASE(IS NOT NULL($0), $0, 2010-03-29 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2014-08-16 00:00:00:TIMESTAMP(9))), greatest(CASE(IS NOT NULL($0), $0, 2013-07-01 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2028-06-18 00:00:00:TIMESTAMP(9))))]) HiveTableScan(table=[[default, table_16]], table:alias=[a1]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)