Re: hive ORC wrong number of index entries error

Prasanth Jayachandran Thu, 24 Sep 2015 01:33:07 -0700

With 650 columns you might need to reduce the compression buffer size to 8KB 
(may be try decreasing it fails or increasing it if it succeeds to find the 
right size) down from default 256KB. You can do that by setting 
orc.compress.size tblproperties.


On Sep 24, 2015, at 3:27 AM, Patrick Duin 
<patd...@gmail.com<mailto:patd...@gmail.com>> wrote:

Thanks for the reply,
My first thought was out of memory as well but the illegal argument exception 
happens before it is a separate entry in the log, The OOM exception is not the 
cause. So I am not sure where that OOM exception fits in. I've tried running it 
with more memory and got the same problem it was also consistently failing on 
the same split.
We have about 650 columns. I don't know how many record writers are open (how 
can I see that?).
I'll try running it with a reduced stripe size see if that helps.
The weird thing is we have a production cluster that is running same 
hadoop/hive versions, same code and same data and processing just fine I get 
this error only in our QA cluster.
It's hard to locate the difference :).
Anyway thanks for the pointers I'll do some more digging.

Cheers,
 Patrick

2015-09-24 0:51 GMT+01:00 Prasanth Jayachandran 
<pjayachand...@hortonworks.com<mailto:pjayachand...@hortonworks.com>>:
Looks like you are running out of memory. Trying increasing the heap memory or 
reducing the stripe size. How many columns are you writing? Any idea how many 
record writers are open per map task?

- Prasanth

On Sep 22, 2015, at 4:32 AM, Patrick Duin 
<patd...@gmail.com<mailto:patd...@gmail.com>> wrote:

Hi all,

I am struggling trying to understand a stack trace I am getting trying to write 
an ORC file:
I am using hive-0.13.0/hadoop-2.4.0.


2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@2ce49e21
java.lang.IllegalArgumentException: Column has wrong number of index entries 
found: org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry$Builder@6eeb967b 
expected: 1
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:578)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1398)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
2015-09-21 09:15:45,988 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
        at 
org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
        at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
        at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:583)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)


I've seen https://issues.apache.org/jira/browse/HIVE-9080 and I think that 
might be related.

I am not using hive though I am using a Map only job that writes to an 
OrcNewOutputFormat.class.

Any pointers would be appreciated, anyone seen this before?



Thanks,

 Patrick

Re: hive ORC wrong number of index entries error

Reply via email to