Re: ORC file split calculation problems

2016-03-04 Thread Patrick Duin
Response inline 2016-03-03 23:39 GMT+01:00 Prasanth Jayachandran < pjayachand...@hortonworks.com>: > Small Correction inline. > > On Mar 3, 2016, at 4:28 PM, Prasanth Jayachandran < > pjayachand...@hortonworks.com> wrote: > > Hi Patrick > > Please find answers inline.. > > > On Mar 1, 2016, at 8:

Re: ORC file split calculation problems

2016-03-03 Thread Prasanth Jayachandran
Small Correction inline. On Mar 3, 2016, at 4:28 PM, Prasanth Jayachandran mailto:pjayachand...@hortonworks.com>> wrote: Hi Patrick Please find answers inline.. On Mar 1, 2016, at 8:41 AM, Patrick Duin mailto:patd...@gmail.com>> wrote: Hi Prasanth, Thanks for this. I tried out the configur

Re: ORC file split calculation problems

2016-03-03 Thread Prasanth Jayachandran
Hi Patrick Please find answers inline.. On Mar 1, 2016, at 8:41 AM, Patrick Duin mailto:patd...@gmail.com>> wrote: Hi Prasanth, Thanks for this. I tried out the configuration and I wanted to share some number with you. My test setup is a cascading job that reads in 240 files (ranging from 1

Re: ORC file split calculation problems

2016-03-01 Thread Patrick Duin
Hi Prasanth, Thanks for this. I tried out the configuration and I wanted to share some number with you. My test setup is a cascading job that reads in 240 files (ranging from 1.5GB to 2.5GB). In the job log I get the duration from these lines: INFO log.PerfLogger: Running this without any of th

Re: ORC file split calculation problems

2016-02-28 Thread Prasanth Jayachandran
Hi Patrick Please find answers inline On Feb 26, 2016, at 9:36 AM, Patrick Duin mailto:patd...@gmail.com>> wrote: Hi Prasanth. Thanks for the quick reply! The logs don't show much more of the stacktrace I'm afraid: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.Or

Re: ORC file split calculation problems

2016-02-26 Thread Patrick Duin
Hi Prasanth. Thanks for the quick reply! The logs don't show much more of the stacktrace I'm afraid: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809) at java.util.concurrent.ThreadPoolExecutor.runWorker(T

Re: ORC file split calculation problems

2016-02-25 Thread Prasanth Jayachandran
> On Feb 25, 2016, at 3:15 PM, Prasanth Jayachandran > wrote: > > Hi Patrick > > Can you paste entire stacktrace? Looks like NPE happened during split > generation but stack trace is incomplete to know what caused it. > > In Hive 0.14.0, the stripe size is changed to 64MB. The default block

Re: ORC file split calculation problems

2016-02-25 Thread Prasanth Jayachandran
Hi Patrick Can you paste entire stacktrace? Looks like NPE happened during split generation but stack trace is incomplete to know what caused it. In Hive 0.14.0, the stripe size is changed to 64MB. The default block size for ORC files is 256MB. 4 stripes can fit a block. ORC does padding to av