[ https://issues.apache.org/jira/browse/HIVE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049710#comment-14049710 ]
Lefty Leverenz commented on HIVE-6382: -------------------------------------- *hive.exec.orc.skip.corrupt.data* is documented in the wiki: * [Configuration Properties -- hive.exec.orc.skip.corrupt.data | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.skip.corrupt.data] > PATCHED_BLOB encoding in ORC will corrupt data in some cases > ------------------------------------------------------------ > > Key: HIVE-6382 > URL: https://issues.apache.org/jira/browse/HIVE-6382 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: orcfile > Fix For: 0.13.0 > > Attachments: HIVE-6382.1.patch, HIVE-6382.2.patch, HIVE-6382.3.patch, > HIVE-6382.4.patch, HIVE-6382.5.patch, HIVE-6382.6.patch > > > In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of > long that stores gap (g) between the values that are patched and the patch > value (p). The maximum distance of gap can be 511 that require 8 bits to > encode. And patch values can take more than 56 bits. When patch values take > more than 56 bits, p + g will become > 64 bits which cannot be packed to a > long. This will result in data corruption under the case where patch values > are > 56 bits. -- This message was sent by Atlassian JIRA (v6.2#6252)