Also, the data put in are primitives, structs (list), and arrays (list); we don't use any of the boxed writables (like text). On Sep 2, 2015 12:57 PM, "David Capwell" <dcapw...@gmail.com> wrote:
> We have multiple threads writing, but each thread works on one file, so > orc writer is only touched by one thread (never cross threads) > On Sep 2, 2015 11:18 AM, "Owen O'Malley" <omal...@apache.org> wrote: > >> I don't see how it would get there. That implies that minimum was null, >> but the count was non-zero. >> >> The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like: >> >> @Override >> OrcProto.ColumnStatistics.Builder serialize() { >> OrcProto.ColumnStatistics.Builder result = super.serialize(); >> OrcProto.StringStatistics.Builder str = >> OrcProto.StringStatistics.newBuilder(); >> if (getNumberOfValues() != 0) { >> str.setMinimum(getMinimum()); >> str.setMaximum(getMaximum()); >> str.setSum(sum); >> } >> result.setStringStatistics(str); >> return result; >> } >> >> and thus shouldn't call down to setMinimum unless it had at least some >> non-null values in the column. >> >> Do you have multiple threads working? There isn't anything that should be >> introducing non-determinism so for the same input it would fail at the same >> point. >> >> .. Owen >> >> >> >> >> On Tue, Sep 1, 2015 at 10:51 PM, David Capwell <dcapw...@gmail.com> >> wrote: >> >>> We are writing ORC files in our application for hive to consume. >>> Given enough time, we have noticed that writing causes a NPE when >>> working with a string column's stats. Not sure whats causing it on >>> our side yet since replaying the same data is just fine, it seems more >>> like this just happens over time (different data sources will hit this >>> around the same time in the same JVM). >>> >>> Here is the code in question, and below is the exception: >>> >>> final Writer writer = OrcFile.createWriter(path, >>> OrcFile.writerOptions(conf).inspector(oi)); >>> try { >>> for (Data row : rows) { >>> List<Object> struct = Orc.struct(row, inspector); >>> writer.addRow(struct); >>> } >>> } finally { >>> writer.close(); >>> } >>> >>> >>> Here is the exception: >>> >>> java.lang.NullPointerException: null >>> at >>> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$Builder.setMinimum(OrcProto.java:1803) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl.serialize(ColumnStatisticsImpl.java:411) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.createRowIndexEntry(WriterImpl.java:1255) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.createRowIndexEntry(WriterImpl.java:1978) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1985) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:322) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157) >>> ~[hive-exec-0.14.0.jar:0.14.0] >>> at >>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2276) >>> ~[hive-exec-0.14.0.jar: >>> >>> >>> Versions: >>> >>> Hadoop: apache 2.2.0 >>> Hive Apache: 0.14.0 >>> Java 1.7 >>> >>> >>> Thanks for your time reading this email. >>> >> >>