We have multiple threads writing, but each thread works on one file, so orc
writer is only touched by one thread (never cross threads)
On Sep 2, 2015 11:18 AM, "Owen O'Malley" <omal...@apache.org> wrote:

> I don't see how it would get there. That implies that minimum was null,
> but the count was non-zero.
>
> The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like:
>
> @Override
> OrcProto.ColumnStatistics.Builder serialize() {
>   OrcProto.ColumnStatistics.Builder result = super.serialize();
>   OrcProto.StringStatistics.Builder str =
>     OrcProto.StringStatistics.newBuilder();
>   if (getNumberOfValues() != 0) {
>     str.setMinimum(getMinimum());
>     str.setMaximum(getMaximum());
>     str.setSum(sum);
>   }
>   result.setStringStatistics(str);
>   return result;
> }
>
> and thus shouldn't call down to setMinimum unless it had at least some 
> non-null values in the column.
>
> Do you have multiple threads working? There isn't anything that should be 
> introducing non-determinism so for the same input it would fail at the same 
> point.
>
> .. Owen
>
>
>
>
> On Tue, Sep 1, 2015 at 10:51 PM, David Capwell <dcapw...@gmail.com> wrote:
>
>> We are writing ORC files in our application for hive to consume.
>> Given enough time, we have noticed that writing causes a NPE when
>> working with a string column's stats.  Not sure whats causing it on
>> our side yet since replaying the same data is just fine, it seems more
>> like this just happens over time (different data sources will hit this
>> around the same time in the same JVM).
>>
>> Here is the code in question, and below is the exception:
>>
>> final Writer writer = OrcFile.createWriter(path,
>> OrcFile.writerOptions(conf).inspector(oi));
>> try {
>> for (Data row : rows) {
>>    List<Object> struct = Orc.struct(row, inspector);
>>    writer.addRow(struct);
>> }
>> } finally {
>>    writer.close();
>> }
>>
>>
>> Here is the exception:
>>
>> java.lang.NullPointerException: null
>>         at
>> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$Builder.setMinimum(OrcProto.java:1803)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl.serialize(ColumnStatisticsImpl.java:411)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.createRowIndexEntry(WriterImpl.java:1255)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.createRowIndexEntry(WriterImpl.java:1978)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1985)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:322)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
>> ~[hive-exec-0.14.0.jar:0.14.0]
>>         at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2276)
>> ~[hive-exec-0.14.0.jar:
>>
>>
>> Versions:
>>
>> Hadoop: apache 2.2.0
>> Hive Apache: 0.14.0
>> Java 1.7
>>
>>
>> Thanks for your time reading this email.
>>
>
>

Reply via email to