[jira] [Commented] (HIVE-24902) Incorrect result after fold CASE into COALESCE

Nemon Lou (Jira) Fri, 19 Mar 2021 02:54:16 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304780#comment-17304780
 ]


Nemon Lou commented on HIVE-24902:
----------------------------------

Here is the process how filter expression goes wrong:
Pre optimize(good):
{code:sql}
IS NOT NULL(CASE(=($0, 1), 
CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'yyyyMMdd'), CAST(86400):BIGINT), _UTF-16LE'yyyyMMdd')):BIGINT, 
20210309))
{code}

After pushes predicates into CASE(good):
{code:sql}
CASE(=($0, 1), IS NOT 
NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'yyyyMMdd'), CAST(86400):BIGINT), _UTF-16LE'yyyyMMdd')):BIGINT), true)
{code}

After constants folding(good):
{code:sql}
CASE(=($0, 1), true, true)
{code}

After Rewrite CASE into COALESCE(bad):
{code:sql}
COALESCE(=($0, 1),false)
{code}

The related code of COALESCE rewrite:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L1079

> Incorrect result after fold CASE into COALESCE
> ----------------------------------------------
>
>                 Key: HIVE-24902
>                 URL: https://issues.apache.org/jira/browse/HIVE-24902
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 3.1.2, 4.0.0
>            Reporter: Nemon Lou
>            Priority: Major
>
> The following sql returns only one record (20210308) but expected two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>       case when b.a=1
>               then  
>                       cast (from_unixtime(unix_timestamp(cast(20210309 as 
> string),'yyyyMMdd') - 86400,'yyyyMMdd') as bigint)
>               else 
>                       20210309 
>          end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> The query plan has incorrect predict: 
>  predicate: COALESCE((col0 = 1),false) (type: boolean)
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         TableScan
>           alias: _dummy_table
>           Row Limit Per Split: 1
>           Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>           Select Operator
>             expressions: 2 (type: int), 1 (type: int), 2 (type: int)
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>             UDTF Operator
>               Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>               function name: stack
>               Filter Operator
>                 predicate: COALESCE((col0 = 1),false) (type: boolean)
>                 Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 Select Operator
>                   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>                   outputColumnNames: _col0
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24902) Incorrect result after fold CASE into COALESCE

Reply via email to