Stamatis Zampetakis created HIVE-28910:
------------------------------------------
Summary: Remove redundant IS NOT NULL predicates when expanding
SEARCH
Key: HIVE-28910
URL: https://issues.apache.org/jira/browse/HIVE-28910
Project: Hive
Issue Type: Improvement
Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
During various CBO transformations (especially during simplifications) the
internal SEARCH (CALCITE-4173) operator is introduced in the plan. The SEARCH
operator cannot be executed directly and must be expanded (using
SearchTransformer) to an equivalent form for further processing.
Currently the expansion is conservative and does not take into account the
context for the evaluation of UNKNOWN/NULL values which in some cases can lead
to redundant IS [NOT] NULL predicates in the expanded form.
+Example+
{code:sql}
explain cbo
select d_date_sk
from date_dim
where d_dom IN (1, 2, 3)
and d_dom IS NOT NULL
{code}
The intermediate plan before SEARCH expansion is shown below:
{noformat}
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[SEARCH($9, Sarg[1, 2, 3; NULL AS FALSE])])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{noformat}
The final plan after SEARCH expansion is show below:
{noformat}
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[AND(IN($9, 1, 2, 3), IS NOT NULL($9))])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{noformat}
Observe that the IS NOT NULL predicate is redundant both in the SQL query and
in the plan however the optimizer is not able to remove it.
The IS NOT NULL predicate is introduced cause the Sarg indicates that we should
handle NULL AS FALSE. However, inside a Filter operator (WHERE clause) nulls
are always treated as false so we can skip the addition of the predicate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)