[ 
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26733:
----------------------------------------
    Description: 
HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago.

Hive's version lacks this commit 
[https://github.com/apache/calcite/commit/8281668f] which introduced the use of 
"IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can be 
NULL.

There is no Calcite ticket for this change, so I am briefly explaining the 
issue here.

Consider the following input as argument of 
HiveRelMdPredicates::pullUpPredicates(Project) method:
{code:java}
SELECT char_length(NULL) FROM t{code}
The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) which 
translates to "=(NULL, NULL)", which turns simplifies to FALSE under the 
unknownAsFalse semantics.

The change will make this methods return "IS NOT DISTINCT FROM($0, 
CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, NULL), 
which is TRUE.

For reference, we have the truth table below (from [1]):
||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}||
|{{0}}|{{0}}|_true_|_true_|
|{{0}}|{{1}}|_false_|_false_|
|{{0}}|{{null}}|_*unknown*_|_*false*_|
|{{null}}|{{null}}|_*unknown*_|_*true*_|

[1] https://modern-sql.com/feature/is-distinct-from

  was:
Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
_(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
_NULL_ literal project expression.

This is because _RexLiteral::isNullLiteral_ is used 
[here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
 while in similar cases, it's often convenient to use 
{_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.


> Not safe to use '=' for predicates on constant expressions that might be NULL
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-26733
>                 URL: https://issues.apache.org/jira/browse/HIVE-26733
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago.
> Hive's version lacks this commit 
> [https://github.com/apache/calcite/commit/8281668f] which introduced the use 
> of "IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can 
> be NULL.
> There is no Calcite ticket for this change, so I am briefly explaining the 
> issue here.
> Consider the following input as argument of 
> HiveRelMdPredicates::pullUpPredicates(Project) method:
> {code:java}
> SELECT char_length(NULL) FROM t{code}
> The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) 
> which translates to "=(NULL, NULL)", which turns simplifies to FALSE under 
> the unknownAsFalse semantics.
> The change will make this methods return "IS NOT DISTINCT FROM($0, 
> CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, 
> NULL), which is TRUE.
> For reference, we have the truth table below (from [1]):
> ||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}||
> |{{0}}|{{0}}|_true_|_true_|
> |{{0}}|{{1}}|_false_|_false_|
> |{{0}}|{{null}}|_*unknown*_|_*false*_|
> |{{null}}|{{null}}|_*unknown*_|_*true*_|
> [1] https://modern-sql.com/feature/is-distinct-from



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to