[ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905871#comment-13905871 ]
Prasanth J commented on HIVE-5994: ---------------------------------- Puneeth, This issue can happen with large positive values as well. The reason being when the number of repetitions of large number is >3 and <=10 SHORT_REPEAT encoding is used. https://github.com/apache/hive/blob/branch-0.12/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java#L35 This encoding zigzag encodes the repeating value. So in your case when 4703275633953830000L is zigzag encoded, the MSB bit (64th) is set which will be considered as a negative value according to this bug. I tested your test case with trunk and it works fine. Applying the patch attached in this JIRA should also work. > ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits ) > ---------------------------------------------------------------- > > Key: HIVE-5994 > URL: https://issues.apache.org/jira/browse/HIVE-5994 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: orcfile > Fix For: 0.13.0 > > Attachments: HIVE-5994.1.patch > > > For large negative BIGINTs, zigzag encoding will yield large value (64bit > value) with MSB set to 1. This value is interpreted as negative value in > SerializationUtils.findClosestNumBits(long value) function. This resulted in > wrong computation of total number of bits required which results in wrong > encoding/decoding of values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)