[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905871#comment-13905871
 ] 

Prasanth J commented on HIVE-5994:
----------------------------------

Puneeth,

This issue can happen with large positive values as well. The reason being when 
the number of repetitions of large number is >3 and <=10 SHORT_REPEAT encoding 
is used. 
https://github.com/apache/hive/blob/branch-0.12/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java#L35

This encoding zigzag encodes the repeating value. So in your case when 
4703275633953830000L is zigzag encoded, the MSB bit (64th) is set which will be 
considered as a negative value according to this bug. 

I tested your test case with trunk and it works fine. Applying the patch 
attached in this JIRA should also work.

> ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
> ----------------------------------------------------------------
>
>                 Key: HIVE-5994
>                 URL: https://issues.apache.org/jira/browse/HIVE-5994
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5994.1.patch
>
>
> For large negative BIGINTs, zigzag encoding will yield large value (64bit 
> value) with MSB set to 1. This value is interpreted as negative value in 
> SerializationUtils.findClosestNumBits(long value) function. This resulted in 
> wrong computation of total number of bits required which results in wrong 
> encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to