Hive 1.2.1 (HDP) ArrayIndexOutOfBounds for highly compressed ORC files

Kristopher Kane Mon, 26 Feb 2018 13:55:44 -0800

I have a highly compressed single ORC file based table generated from Hive
DDL.  Raw size reports 120GB ORC/Snappy compressed down to 990 MB (ORC with
no compression is still only 1.3GB) .  Hive on MR is throwing
ArrayIndexOutOfBoundsException like the following:


Diagnostic Messages for this Task:
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {} json row data redacted.
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row {} json row data redacted.

at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.UnionOperator.process(UnionOperator.java:141)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:188)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:555)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:398)
... 15 more


We have been on 1.2.1 for a long time and am now just running into this,
but, I think this compression rate is a new thing.

There are a couple of cases of this in the cluster right now.  In other
cases that contain a GROUP BY, I am able to decrease input split sizes
(down from 1g min, 2gb max) and the query would work.  I've been suspicious
that some OOM is being swallowed and job kill error surfacing as this row
processing exception.

Hive 1.2.1 (HDP) ArrayIndexOutOfBounds for highly compressed ORC files

Reply via email to