Stamatis Zampetakis created HIVE-28910:
------------------------------------------

             Summary: Remove redundant IS NOT NULL predicates when expanding 
SEARCH
                 Key: HIVE-28910
                 URL: https://issues.apache.org/jira/browse/HIVE-28910
             Project: Hive
          Issue Type: Improvement
          Components: CBO
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


During various CBO transformations (especially during simplifications) the 
internal SEARCH (CALCITE-4173) operator is introduced in the plan. The SEARCH 
operator cannot be executed directly and must be expanded (using 
SearchTransformer) to an equivalent form for further processing.

Currently the expansion is conservative and does not take into account the 
context for the evaluation of UNKNOWN/NULL values which in some cases can lead 
to redundant IS [NOT] NULL predicates in the expanded form.

+Example+
{code:sql}
explain cbo
select d_date_sk
from date_dim
where d_dom IN (1, 2, 3)
  and d_dom IS NOT NULL
{code}

The intermediate plan before SEARCH expansion is shown below:
{noformat}
HiveProject(d_date_sk=[$0])
  HiveFilter(condition=[SEARCH($9, Sarg[1, 2, 3; NULL AS FALSE])])
    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{noformat}

The final plan  after SEARCH expansion is show below:
{noformat}
HiveProject(d_date_sk=[$0])
  HiveFilter(condition=[AND(IN($9, 1, 2, 3), IS NOT NULL($9))])
    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{noformat}

Observe that the IS NOT NULL predicate is redundant both in the SQL query and 
in the plan however the optimizer is not able to remove it. 

The IS NOT NULL predicate is introduced cause the Sarg indicates that we should 
handle NULL AS FALSE. However, inside a Filter operator (WHERE clause) nulls 
are always treated as false so we can skip the addition of the predicate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to