[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Solimando updated HIVE-26722: ---------------------------------------- Description: h1. Reproducer Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000; {code} The expected result is: {code:java} 1000 b1 1000 NULL{code} An example of correct plan is as follows: {noformat} CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_} {noformat} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); {noformat} In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result: {code:java} 1000 b1{code} produced by the following invalid plan: {code:java} CBO PLAN: HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) {code} h1. Problem Analysis At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection: {code:java} (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred. In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics. To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics: {code:java} AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000)) {code} Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned. Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference. The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case. h1. Solution In order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account. The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following: {code:java} final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code} This is in line with other uses of simplification, like in Calcite: [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L704-L705] You can see the above method used for different kind of rels, always in the way identical to what this ticket advocates:filters: [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L152-L155] projects: [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L306-L310] joins: [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L380-L383] windows: [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L567-L576] was: h1. Reproducer Consider the following query: {code:java} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000; {code} The expected result is: {code:java} 1000 b1 1000 NULL{code} An example of correct plan is as follows: {noformat} CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_} {noformat} set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); {noformat} In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result: {code:java} 1000 b1{code} produced by the following invalid plan: {code:java} CBO PLAN: HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) {code} h1. Problem Analysis At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection: {code:java} (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred. In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics. To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics: {code:java} AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000)) {code} Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned. Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference. The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case. h1. Solution In order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account. The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following: {code:java} final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code} > HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands > ------------------------------------------------------------------ > > Key: HIVE-26722 > URL: https://issues.apache.org/jira/browse/HIVE-26722 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 4.0.0-alpha-1 > Reporter: Alessandro Solimando > Assignee: Alessandro Solimando > Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > h1. Reproducer > Consider the following query: > {code:java} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > CREATE EXTERNAL TABLE t (a string, b string); > INSERT INTO t VALUES ('1000', 'b1'); > INSERT INTO t VALUES ('2000', 'b2'); > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000;EXPLAIN CBO > SELECT * FROM ( > SELECT > a, > b > FROM t > UNION ALL > SELECT > a, > CAST(NULL AS string) > FROM t) AS t2 > WHERE a = 1000; {code} > > The expected result is: > {code:java} > 1000 b1 > 1000 NULL{code} > An example of correct plan is as follows: > {noformat} > CBO PLAN: > HiveUnion(all=[true]) > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) > HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET > "UTF-16LE"]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]){noformat} > > Consider now a scenario where expression reduction in projections is disabled > by setting the following property{_}:{_} > {noformat} > set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); > {noformat} > In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, > and we get the following (invalid) result: > {code:java} > 1000 b1{code} > produced by the following invalid plan: > {code:java} > CBO PLAN: > HiveProject(a=[$0], b=[$1]) > HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) > HiveTableScan(table=[[default, t]], table:alias=[t]) {code} > h1. Problem Analysis > At > [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] > the _RelMetadataQuery::getPulledUpPredicates_ method infers the following > predicate due to the CAST(NULL) in the projection: > {code:java} > (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code} > When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is > inferred. > In > [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], > the rule checks if the conjunction of the predicate coming from the filter > (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or > not, under the _UnknownAsFalse_ semantics. > To summarize, the following expression is simplified under the > _UnknownAsFalse_ semantics: > {code:java} > AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), > =(CAST($0):DOUBLE, 1000)) > {code} > Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to > {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, > =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and > the UNION ALL operand is pruned. > Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the > issue, due to the _IS_NULL($1)_ inferred predicate, see > [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] > for understanding how the NULL literal is treated differently during > predicate inference. > The problem lies in the fact that, depending on the input _RelNode_ that we > infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, > but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this > case. > h1. Solution > In order to correctly simplify a predicate and test if it's always false or > not, we should build RexSimplify with _predicates_ as the list of predicates > known to hold in the context. In this way, the different semantics are > correctly taken into account. > The code at > [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] > should be replaced by the following: > {code:java} > final RexExecutor executor = > Util.first(filterRel.getCluster().getPlanner().getExecutor(), > RexUtil.EXECUTOR); > final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, > executor); > final RexNode x = simplify.simplifyUnknownAs(newCondition, > RexUnknownAs.FALSE);{code} > This is in line with other uses of simplification, like in Calcite: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L704-L705] > You can see the above method used for different kind of rels, always in the > way identical to what this ticket advocates:filters: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L152-L155] > projects: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L306-L310] > joins: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L380-L383] > windows: > [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L567-L576] -- This message was sent by Atlassian Jira (v8.20.10#820010)