[jira] [Updated] (HIVE-6320) Row-based ORC reader with PPD turned on dies on BufferUnderFlowException

Prasanth J (JIRA) Thu, 30 Jan 2014 13:13:21 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prasanth J updated HIVE-6320:
-----------------------------

    Attachment: HIVE-6320.1.patch

The issue was related to generating the disk range boundaries. If two adjacent 
row groups have same compressed block offset then the worst case slop that was 
added to the end offset will contain only the current compression block. In 
some cases the values towards the end of this compression block will stretch 
beyond the boundary to fetch values causing BufferUnderFlowException. 

The attached patch extends this worst case slop boundary to safely accommodate 
the adjacent compression block.

> Row-based ORC reader with PPD turned on dies on BufferUnderFlowException 
> -------------------------------------------------------------------------
>
>                 Key: HIVE-6320
>                 URL: https://issues.apache.org/jira/browse/HIVE-6320
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Gopal V
>              Labels: orcfile
>         Attachments: HIVE-6320.1.patch
>
>
> ORC data reader crashes out on a BufferUnderflowException, while trying to 
> read data row-by-row with the predicate push-down enabled on current trunk.
> {code}
> Caused by: java.nio.BufferUnderflowException
>       at java.nio.Buffer.nextGetIndex(Buffer.java:472)
>       at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
>       at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
>       at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
>       at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
>       at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
>       at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
>       at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
>       at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
>       at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
> {code}
> The query run is 
> {code}
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.index.filter=true;
> insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey 
> is not null;
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HIVE-6320) Row-based ORC reader with PPD turned on dies on BufferUnderFlowException

Reply via email to