[
https://issues.apache.org/jira/browse/HIVE-29585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087057#comment-18087057
]
Kokila N commented on HIVE-29585:
---------------------------------
*Failing Test cases:*
* Iceberg + Llap + Vectorization + ORC + lz4 compression .
** Disable Vectorization as workaround (set hive.vectorized.execution.enabled
= false)
* Llap + ORC + lz4
*Root Cause:*
When *Llap Cache* is enabled and the data is read for the first time, it is
read from the disk. If the same data is accessed multiple times, it stores the
file metadata in the cache so that the subsequent queries execute faster. For
this, LLAP stores metadata in *Direct Memory Buffers* which is not stored in
Java Heap buffer and so *not an array.*
So, when there is a *cache hit* for a query, we read from cache(here it is ORC
file footer stripe) which is a direct
buffer([getStripeFooterFromCacheOrDisk|https://github.com/apache/hive/blob/709d06bce95df7dc66c63f90ce99aadf8f24f489/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L728])
and send it to ORC to
decompress([createORCStripeMetadataObject|https://github.com/apache/hive/blob/709d06bce95df7dc66c63f90ce99aadf8f24f489/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L720]).
And the ORC LZ4 only knows to *decompress the java heap buffer* which is an
array
([https://github.com/apache/orc/blob/dd3ec892123e42d6dfe3e0db3da40fd36d62c46a/java/core/src/java/org/apache/orc/impl/AircompressorCodec.java#L94]
) . So, when it tries to read direct buffer as array , we get
{*}UnsupportedOperationException{*}.
> ORC LZ4: OrcEncodedDataReader stripe footer fails on direct ByteBuffers from
> LLAP cache / ZCR
> ---------------------------------------------------------------------------------------------
>
> Key: HIVE-29585
> URL: https://issues.apache.org/jira/browse/HIVE-29585
> Project: Hive
> Issue Type: Bug
> Reporter: Kokila N
> Assignee: Kokila N
> Priority: Major
>
> Query:
> {code:java}
> CREATE TABLE IF NOT EXISTS ice_orc_test (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='lz4'
> );
> // Error on the 4th try
> INSERT INTO ice_orc_test SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(),
> uuid() FROM ice_orc_test; {code}
> Error:
> {code:java}
> Caused by: java.lang.UnsupportedOperationException
> at java.base/java.nio.ByteBuffer.array(ByteBuffer.java:1505)
> at
> org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94)
> at
> org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:521)
> at
> org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:548)
> at
> org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:535)
> at
> com.google.protobuf.CodedInputStream$StreamDecoder.read(CodedInputStream.java:2036)
> at
> com.google.protobuf.CodedInputStream$StreamDecoder.tryRefillBuffer(CodedInputStream.java:2777)
> at
> com.google.protobuf.CodedInputStream$StreamDecoder.isAtEnd(CodedInputStream.java:2700)
> at
> com.google.protobuf.CodedInputStream$StreamDecoder.readTag(CodedInputStream.java:2063)
> at org.apache.orc.OrcProto$StripeFooter.<init>(OrcProto.java:19300)
> at
> org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:20956)
> at
> org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:20950)
> at
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:63)
> at
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:68)
> at
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
> at
> com.google.protobuf.GeneratedMessageV3.parseWithIOException(GeneratedMessageV3.java:353)
> at org.apache.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:19736)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.buildStripeFooter(OrcEncodedDataReader.java:691)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getStripeFooterFromCacheOrDisk(OrcEncodedDataReader.java:740)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.readStripesMetadata(OrcEncodedDataReader.java:707)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:360)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)