[ https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819917#comment-17819917 ]
okumin commented on HIVE-28088: ------------------------------- I found (2), (4), and (6) generated different results with CBO or without CBO. Also, (3) vs (5) and (4) vs (6) are not consistent, which means CTE materialization could change a result when there are duplicated columns. || ||CBO||Non-CBO|| |(1) Sub-query|(key, col_1)|(key, col1)| |(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error| |(3) CTE|(key, col_1)|(key, col_1)| |(4) CTE + top-level join|(key, col_1, key, col_1)|Error| |(5) Materialized CTE|(key)|(key)| |(6) Materialized CTE + top-level join|Error|(key, key)| Test queries. {code:java} DROP TABLE IF EXISTS test1; DROP TABLE IF EXISTS test2; CREATE TABLE test1 (key STRING); CREATE TABLE test2 (key STRING); -- (1) SELECT * FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c; -- (2) SELECT * FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c JOIN (SELECT a.key, b.key FROM test2 a JOIN test2 b ON (a.key = b.key)) d ON c.key = d.key; -- (3) WITH cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM cte; -- (4) WITH cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM cte c JOIN cte d ON (c.key = d.key); set hive.optimize.cte.materialize.threshold=1; set hive.optimize.cte.materialize.full.aggregate.only=false; -- (5) WITH materialized_cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM materialized_cte; -- (6) WITH materialized_cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM materialized_cte c JOIN materialized_cte d ON (c.key = d.key);{code} With CBO + `hive.cbo.fallback.strategy=NEVER;`. {code:java} (1) +--------+----------+ | c.key | c._col1 | +--------+----------+ +--------+----------+ (2) +--------+----------+--------+----------+ | c.key | c._col1 | d.key | d._col1 | +--------+----------+--------+----------+ +--------+----------+--------+----------+ (3) +----------+------------+ | cte.key | cte._col1 | +----------+------------+ +----------+------------+ (4) +--------+----------+--------+----------+ | c.key | c._col1 | d.key | d._col1 | +--------+----------+--------+----------+ +--------+----------+--------+----------+ (5) +-----------------------+ | materialized_cte.key | +-----------------------+ +-----------------------+ (6) Error: Error while compiling statement: FAILED: CalciteSemanticException Could not resolve column name (state=42000,code=40000){code} Without CBO. {code:java} (1) +--------+----------+ | c.key | c._col1 | +--------+----------+ (2) Error: Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference key in c (state=42000,code=10007) (3) +----------+------------+ | cte.key | cte._col1 | +----------+------------+ +----------+------------+ (4) Error: Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference key in c (state=42000,code=10007) (5) +-----------------------+ | materialized_cte.key | +-----------------------+ +-----------------------+ (6) +--------+--------+ | c.key | d.key | +--------+--------+ +--------+--------+{code} > CalcitePlanner fails to resolve column names on materialized CTEs > ----------------------------------------------------------------- > > Key: HIVE-28088 > URL: https://issues.apache.org/jira/browse/HIVE-28088 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 4.0.0-beta-1 > Reporter: okumin > Assignee: okumin > Priority: Major > > We found a case where CBO fails when it refers to a materialized CTE. > These are queries to reproduce the issue. > {code:java} > DROP TABLE IF EXISTS test; > CREATE TABLE test (key STRING); > set hive.optimize.cte.materialize.threshold=1; > set hive.optimize.cte.materialize.full.aggregate.only=false; > EXPLAIN CBO WITH materialized_cte AS ( > SELECT a.key, b.key > FROM test a > JOIN test b ON (a.key = b.key) > ) > SELECT * > FROM materialized_cte c > JOIN materialized_cte d ON (c.key = d.key); {code} > CBO fails. > {code:java} > +----------+ > | Explain | > +----------+ > +----------+ {code} > Error log on HiveServer2. > {code:java} > 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef > HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, > skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could > not resolve column name > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > h at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188) > ~[hive-exec-4.0.0-beta-2- > SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNodeJoinCond(RexNodeTypeCheck.java:60) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinRelNode(CalcitePlanner.java:2656) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinLogicalPlan(CalcitePlanner.java:2888) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5048) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1625) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1569) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1321) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13122) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)