We are writing ORC files in our application for hive to consume.
Given enough time, we have noticed that writing causes a NPE when
working with a string column's stats.  Not sure whats causing it on
our side yet since replaying the same data is just fine, it seems more
like this just happens over time (different data sources will hit this
around the same time in the same JVM).

Here is the code in question, and below is the exception:

final Writer writer = OrcFile.createWriter(path,
OrcFile.writerOptions(conf).inspector(oi));
try {
for (Data row : rows) {
   List<Object> struct = Orc.struct(row, inspector);
   writer.addRow(struct);
}
} finally {
   writer.close();
}


Here is the exception:

java.lang.NullPointerException: null
        at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$Builder.setMinimum(OrcProto.java:1803)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl.serialize(ColumnStatisticsImpl.java:411)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.createRowIndexEntry(WriterImpl.java:1255)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.createRowIndexEntry(WriterImpl.java:1978)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1985)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:322)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
~[hive-exec-0.14.0.jar:0.14.0]
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2276)
~[hive-exec-0.14.0.jar:


Versions:

Hadoop: apache 2.2.0
Hive Apache: 0.14.0
Java 1.7


Thanks for your time reading this email.

Reply via email to