With 650 columns you might need to reduce the compression buffer size to 8KB (may be try decreasing it fails or increasing it if it succeeds to find the right size) down from default 256KB. You can do that by setting orc.compress.size tblproperties.
On Sep 24, 2015, at 3:27 AM, Patrick Duin <patd...@gmail.com<mailto:patd...@gmail.com>> wrote: Thanks for the reply, My first thought was out of memory as well but the illegal argument exception happens before it is a separate entry in the log, The OOM exception is not the cause. So I am not sure where that OOM exception fits in. I've tried running it with more memory and got the same problem it was also consistently failing on the same split. We have about 650 columns. I don't know how many record writers are open (how can I see that?). I'll try running it with a reduced stripe size see if that helps. The weird thing is we have a production cluster that is running same hadoop/hive versions, same code and same data and processing just fine I get this error only in our QA cluster. It's hard to locate the difference :). Anyway thanks for the pointers I'll do some more digging. Cheers, Patrick 2015-09-24 0:51 GMT+01:00 Prasanth Jayachandran <pjayachand...@hortonworks.com<mailto:pjayachand...@hortonworks.com>>: Looks like you are running out of memory. Trying increasing the heap memory or reducing the stripe size. How many columns are you writing? Any idea how many record writers are open per map task? - Prasanth On Sep 22, 2015, at 4:32 AM, Patrick Duin <patd...@gmail.com<mailto:patd...@gmail.com>> wrote: Hi all, I am struggling trying to understand a stack trace I am getting trying to write an ORC file: I am using hive-0.13.0/hadoop-2.4.0. 2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@2ce49e21 java.lang.IllegalArgumentException: Column has wrong number of index entries found: org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry$Builder@6eeb967b expected: 1 at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:578) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1398) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2015-09-21 09:15:45,988 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117) at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168) at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:583) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) I've seen https://issues.apache.org/jira/browse/HIVE-9080 and I think that might be related. I am not using hive though I am using a Map only job that writes to an OrcNewOutputFormat.class. Any pointers would be appreciated, anyone seen this before? Thanks, Patrick