We have multiple threads writing, but each thread works on one file, so orc writer is only touched by one thread (never cross threads) On Sep 2, 2015 11:18 AM, "Owen O'Malley" <omal...@apache.org> wrote:
> I don't see how it would get there. That implies that minimum was null, > but the count was non-zero. > > The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like: > > @Override > OrcProto.ColumnStatistics.Builder serialize() { > OrcProto.ColumnStatistics.Builder result = super.serialize(); > OrcProto.StringStatistics.Builder str = > OrcProto.StringStatistics.newBuilder(); > if (getNumberOfValues() != 0) { > str.setMinimum(getMinimum()); > str.setMaximum(getMaximum()); > str.setSum(sum); > } > result.setStringStatistics(str); > return result; > } > > and thus shouldn't call down to setMinimum unless it had at least some > non-null values in the column. > > Do you have multiple threads working? There isn't anything that should be > introducing non-determinism so for the same input it would fail at the same > point. > > .. Owen > > > > > On Tue, Sep 1, 2015 at 10:51 PM, David Capwell <dcapw...@gmail.com> wrote: > >> We are writing ORC files in our application for hive to consume. >> Given enough time, we have noticed that writing causes a NPE when >> working with a string column's stats. Not sure whats causing it on >> our side yet since replaying the same data is just fine, it seems more >> like this just happens over time (different data sources will hit this >> around the same time in the same JVM). >> >> Here is the code in question, and below is the exception: >> >> final Writer writer = OrcFile.createWriter(path, >> OrcFile.writerOptions(conf).inspector(oi)); >> try { >> for (Data row : rows) { >> List<Object> struct = Orc.struct(row, inspector); >> writer.addRow(struct); >> } >> } finally { >> writer.close(); >> } >> >> >> Here is the exception: >> >> java.lang.NullPointerException: null >> at >> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$Builder.setMinimum(OrcProto.java:1803) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl.serialize(ColumnStatisticsImpl.java:411) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.createRowIndexEntry(WriterImpl.java:1255) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.createRowIndexEntry(WriterImpl.java:1978) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1985) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:322) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157) >> ~[hive-exec-0.14.0.jar:0.14.0] >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2276) >> ~[hive-exec-0.14.0.jar: >> >> >> Versions: >> >> Hadoop: apache 2.2.0 >> Hive Apache: 0.14.0 >> Java 1.7 >> >> >> Thanks for your time reading this email. >> > >