We are writing ORC files in our application for hive to consume. Given enough time, we have noticed that writing causes a NPE when working with a string column's stats. Not sure whats causing it on our side yet since replaying the same data is just fine, it seems more like this just happens over time (different data sources will hit this around the same time in the same JVM).
Here is the code in question, and below is the exception: final Writer writer = OrcFile.createWriter(path, OrcFile.writerOptions(conf).inspector(oi)); try { for (Data row : rows) { List<Object> struct = Orc.struct(row, inspector); writer.addRow(struct); } } finally { writer.close(); } Here is the exception: java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$Builder.setMinimum(OrcProto.java:1803) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl.serialize(ColumnStatisticsImpl.java:411) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.createRowIndexEntry(WriterImpl.java:1255) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.WriterImpl.createRowIndexEntry(WriterImpl.java:1978) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1985) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:322) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157) ~[hive-exec-0.14.0.jar:0.14.0] at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2276) ~[hive-exec-0.14.0.jar: Versions: Hadoop: apache 2.2.0 Hive Apache: 0.14.0 Java 1.7 Thanks for your time reading this email.