[ 
https://issues.apache.org/jira/browse/HIVE-25734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25734.
-------------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

merged into master. Thank you [~asolimando]!

> Wrongly-typed constant in case expression leads to incorrect empty result
> -------------------------------------------------------------------------
>
>                 Key: HIVE-25734
>                 URL: https://issues.apache.org/jira/browse/HIVE-25734
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, Query Planning
>    Affects Versions: 4.0.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
>  
> The type of constants in case expressions should be inferred, if possible, by 
> the "surrounding" input reference columns, if any.
> Consider the following table and query: 
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as 
> parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aaaaaa');
> with base_t as (select row_seq, row_desc,
>   case row_seq
>     when 1 then '34'
>     when 6 then '35'
>     when 2 then '36'
>   end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34" 
> is expected.
>  
> To understand the root cause, let's consider the debug input and output of 
> some related CBO rules which are triggered during the evaluation of the 
> query: 
>  
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 
> 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), 
> '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as 
> "SMALLINT" in the first part of the "AND" operand, while they are typed as 
> "INTEGER" for the "CASE" expression, despite the input reference "$0" being 
> available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" 
> missing the commonality between the two "IN" expressions, whose intersection 
> is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to