[ https://issues.apache.org/jira/browse/HIVE-25734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zoltan Haindrich resolved HIVE-25734. ------------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you [~asolimando]! > Wrongly-typed constant in case expression leads to incorrect empty result > ------------------------------------------------------------------------- > > Key: HIVE-25734 > URL: https://issues.apache.org/jira/browse/HIVE-25734 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning > Affects Versions: 4.0.0 > Reporter: Alessandro Solimando > Assignee: Alessandro Solimando > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > > The type of constants in case expressions should be inferred, if possible, by > the "surrounding" input reference columns, if any. > Consider the following table and query: > {code:java} > create external table test_case (row_seq smallint, row_desc string) stored as > parquet; > insert into test_case values (1, 'a'); > insert into test_case values (2, 'aa'); > insert into test_case values (6, 'aaaaaa'); > with base_t as (select row_seq, row_desc, > case row_seq > when 1 then '34' > when 6 then '35' > when 2 then '36' > end as zb from test_case where row_seq in (1,2,6)) > select row_seq, row_desc, zb from base_t where zb <> '34';{code} > The aforementioned query fails by returning an empty results, while "1 a 34" > is expected. > > To understand the root cause, let's consider the debug input and output of > some related CBO rules which are triggered during the evaluation of the > query: > > {noformat} > --$0 is the column 'row_seq' > 1. HiveReduceExpressionsRule > Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, > 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), > '36':VARCHAR, null:VARCHAR), '34':CHAR(2))) > Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), > =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER))) > 2. HivePointLookupOptimizerRule.RexTransformIntoInClause > Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), > =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER))) > Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, > 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER))) > 3. HivePointLookupOptimizerRule.RexMergeInClause > Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, > 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER))) > Output: false{noformat} > In the first part, we can see that the constants are correctly typed as > "SMALLINT" in the first part of the "AND" operand, while they are typed as > "INTEGER" for the "CASE" expression, despite the input reference "$0" being > available for inferring a more precise type. > This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" > missing the commonality between the two "IN" expressions, whose intersection > is considered empty, hence the empty result. > Providing a more refined type inference for "case" expressions should fix the > issue. -- This message was sent by Atlassian Jira (v8.20.1#820001)