[ https://issues.apache.org/jira/browse/HIVE-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085607#comment-17085607 ]
Krisztian Kasa commented on HIVE-23206: --------------------------------------- I used the simplified query that [~scarlin] mentioned in the description to reproduce this. Plan before *applyJoinOrderingTransform*: {code} HiveProject(ws_order_number=[$0]) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws1]) HiveAggregate(group=[{0}]) HiveProject(wr_order_number=[$0]) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(wr_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_returns]], table:alias=[web_returns]) HiveProject(ws_order_number=[$0]) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws1]) HiveProject(wr_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_returns]], table:alias=[wr2]) {code} Plan after *HiveJoinProjectTransposeRule.RIGHT_PROJECT_BTW_JOIN*: {code} HiveProject(ws_order_number=[$0]) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws1]) HiveAggregate(group=[{0}]) HiveProject(wr_order_number=[$0]) HiveProject(wr_order_number=[$0], ws_order_number=[$6]) HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], cost=[not available]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_returns]], table:alias=[web_returns]) HiveProject(ws_order_number=[$0]) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws1]) HiveProject(wr_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_returns]], table:alias=[wr2]) {code} The project under the Join was pulled up on top of it and as a result there is no Project anymore between the Join and the Filter operator: {code} HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(wr_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) {code} {code} HiveProject(wr_order_number=[$0], ws_order_number=[$6]) HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], cost=[not available]) HiveFilter(condition=[IS NOT NULL($0)]) {code} Later in *PlanModifierForASTConv* a new Project is introduced between the Join and the Filter https://github.com/apache/hive/blob/c4023c5a23c21933d68187b7c57b9b9c4f3cbdbb/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java#L249 This method takes all the columns from the Filter operator that is why the Project contains all of them. > Project not defined correctly after reordering a join > ----------------------------------------------------- > > Key: HIVE-23206 > URL: https://issues.apache.org/jira/browse/HIVE-23206 > Project: Hive > Issue Type: Bug > Components: CBO > Reporter: Steve Carlin > Assignee: Krisztian Kasa > Priority: Major > > The following highlighted line seems to be incorrect in the test suite: > [https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/perf/tez/cbo_query95.q.out#L89] > Note that the project takes all the columns from the table scan, yet it only > needs a couple of them. > I did some very small debugging on this. When I removed the > applyJoinOrderingTransform here: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1897] > ... the problem goes away. So presumably one of the rules in there is > causing the problem. > Here is a slightly simplified version of the query which has the same problem > (using the same tpc-ds database): > explain cbo with ws_wh as > (select ws1.ws_order_number > from web_sales ws1,web_returns wr2 > where ws1.ws_order_number = wr2.wr_order_number) > select > ws_order_number > from > web_sales ws1 > where > ws1.ws_order_number in (select wr_order_number > from web_returns,ws_wh > where wr_order_number = ws_wh.ws_order_number) > -- This message was sent by Atlassian Jira (v8.3.4#803005)