[ 
https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357718#comment-16357718
 ] 

Vihang Karajgaonkar commented on HIVE-18421:
--------------------------------------------

I can enable this config by default but that would trigger an q.out update of 
over 300 q files. Also, there were concerns raised by [~mmccline] and [~gopalv] 
above regarding performance. I didn't investigate the performance overhead of 
the fix. The issue is there is no well-defined policy within Hive overall on 
how to handle overflows. So it is arguable if users would want to enable this 
config by default or not since Hive itself doesn't handle overflows well 
overall. The config does handle overflows in the "right" way. It only makes 
vectorized execution overflow handling similar to non-vectorized handling. That 
was the reason I disabled the config by default. I can investigate the overhead 
and turn it on by default if there is no significant overhead. Any thoughts 
[~gopalv] [~aihuaxu] on this?

> Vectorized execution handles overflows in a different manner than 
> non-vectorized execution
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18421
>                 URL: https://issues.apache.org/jira/browse/HIVE-18421
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-18421.01.patch, HIVE-18421.02.patch, 
> HIVE-18421.03.patch, HIVE-18421.04.patch, HIVE-18421.05.patch, 
> HIVE-18421.06.patch, HIVE-18421.07.patch
>
>
> In vectorized execution arithmetic operations which cause integer overflows 
> can give wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by 
> diff desc;
> +-------+-----+-------+
> |  t1   | t2  | diff  |
> +-------+-----+-------+
> | -104  | 25  | 127   |
> | -112  | 24  | 120   |
> | 54    | 9   | 45    |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to