Thai Bui created HIVE-21074:
-------------------------------
Summary: Hive bucketed table query pruning does not work for IS
NOT NULL condition
Key: HIVE-21074
URL: https://issues.apache.org/jira/browse/HIVE-21074
Project: Hive
Issue Type: Bug
Components: Query Planning
Affects Versions: 3.1.1, 3.1.0, 3.0.0
Reporter: Thai Bui
Assignee: Thai Bui
The current version of bucket pruning skips all the predicates when it detects
that one of the predicates is a compound type (e.g. NOT(IS_NULL) ) when
evaluating AND logical operators.
This logic is faulty since as long as one of the AND operators is a bucketed
column (_col_ = *literal*), the *literal* value of that _col_ should be
considered in the bucket pruning optimization no matter what. For example:
SELECT * FROM tbl WHERE bucketed_col = 1 AND (some_compound_expr)
Then the the value '*1'* should be considered for pruning in the query plan.
This limitation has manifested into a simpler case where a table that I am
trying to optimized using bucketing technique is not effective when IS NOT NULL
is used. Since IS NOT NULL is parsed into NOT(IS_NULL) (a compound expression),
the pruning phase is completed skipped causing unnecessary tasks to be spawned.
For instance:
SELECT * FROM tbl WHERE bucketed_col = 1 AND some_other_col IS NOT NULL
Will not trigger bucket pruning logic and perform a full table scan.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)