RE: LLAP can't read ORC ZLIB files from S3

Aaron Grubb Sun, 28 Jun 2020 09:24:03 -0700

Hi Owen, the problem disappeared when I stopped using 
orc.write.variable.length.blocks but how would I go about turning off direct 
byte buffers on read out of curiosity? I can’t find any settings to control 
this.


Thanks,
Aaron

From: Owen O'Malley <owen.omal...@gmail.com>
Sent: Thursday, June 25, 2020 1:21 PM
To: user@hive.apache.org
Subject: Re: LLAP can't read ORC ZLIB files from S3

Actually, it looks like LLAP is trying to get the ByteBuffer array from a 
direct byte buffer. Turning off direct byte buffers on read should fix the 
problem.

.. Owen

On Thu, Jun 25, 2020 at 7:27 AM Aaron Grubb 
<aaron.gr...@clearpier.com<mailto:aaron.gr...@clearpier.com>> wrote:
This appears to have been caused by orc.write.variable.length.blocks=true which 
I had set for HDFS-based tables. Setting this to false and inserting data into 
the S3 table appears to have fixed this problem.

From: Aaron Grubb <aaron.gr...@clearpier.com<mailto:aaron.gr...@clearpier.com>>
Sent: Wednesday, June 24, 2020 4:04 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: LLAP can't read ORC ZLIB files from S3

Hello everyone,

I’m encountering an error that I can’t find any information on. I’ve inserted 
data into a table with storage in S3 in ORC ZLIB format. I can query this data 
directly without issues. Running a query that requires LLAP causes the 
following error:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
java.lang.UnsupportedOperationException
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
java.lang.UnsupportedOperationException
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
        ... 15 more
Caused by: java.io.IOException: java.io.IOException: 
java.lang.UnsupportedOperationException
        at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
        at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
        at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
        at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
        at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
        at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
        at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
        at 
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        ... 17 more
Caused by: java.io.IOException: java.lang.UnsupportedOperationException
        at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readIndexStreams(EncodedReaderImpl.java:1954)
        at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:384)
        at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:263)
        at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:260)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:260)
        at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:109)
        ... 6 more
Caused by: java.lang.UnsupportedOperationException
        at java.nio.ByteBuffer.array(ByteBuffer.java:994)
        at org.apache.orc.impl.ZlibCodec.decompress(ZlibCodec.java:94)
        at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.decompressChunk(EncodedReaderImpl.java:1283)
        at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:902)
        at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readIndexStreams(EncodedReaderImpl.java:1918)

This is specific to ORC ZLIB files on S3 being processed through LLAP. I can 
query other types of files in S3 through LLAP, I can query the ORC ZLIB data on 
S3 directly (select * from orc_zlib_on_s3_table limit 10) and I can execute the 
same query that fails in LLAP in Native Tez containers. Does anyone have any 
suggestions as to what the problem might be or how to debug it?

Thanks,
Aaron

RE: LLAP can't read ORC ZLIB files from S3

Reply via email to