Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
Great to hear! Thanks Prasanth Jayachandran On Feb 10, 2014, at 2:50 PM, Avrilia Floratou wrote: > Hi Prasanth, > > It seems that I was actually using the > hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and > that was generating 363 map tasks. I tried the org.apache.

Re: ORC file question

2014-02-10 Thread Avrilia Floratou
Hi Prasanth, It seems that I was actually using the hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and that was generating 363 map tasks. I tried the org.apache. hadoop.hive.ql.io.HiveInputFormat and I as actually able to get 182 map tasks and get rid of the short map tasks.

Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
> 2) From describe extended: inputFormat: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OrcInputFormat can be bypassed if hive.input.format is set to CombineHiveInputFormat. There are two different split computation code path both of which may generate different number of splits and hence

Re: ORC file question

2014-02-10 Thread Avrilia Floratou
Hi Prasanth, Here are the answers to your questions: 1) Yes I have set both set hive.optimize.ppd=true; set hive.optimize.index.filter=true; 2) From describe extended: inputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 3) Hive 0.12 4) Select max (I1) from table; Thanks, Avrilia On Mo

Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
Hi Avrilia I have few more questions 1) Have you enabled ORC predicate pushdown by setting hive.optimize.index.filter? 2) What is the value for hive.input.format? 3) Which hive version are you using? 4) What query are you using? Thanks Prasanth Jayachandran On Feb 10, 2014, at 1:26 PM, Avrilia

Re: ORC file question

2014-02-10 Thread Avrilia Floratou
Hi Prasanth, No it's not a partitioned table. The table consists of only one file of (91.7 GB). When I created the table I loaded data from a text table to the orc table and used only 1 map task so that only one large file is created and not many small files. This is why I'm getting confused with

Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
Hi Avrilia Is it a partitioned table? If so approximately how many partitions are there and how many files are there? What is the value for hive.input.format? My suspicion is that there are ~180 files and each file is ~515MB in size. Since, you had mentioned you are using default stripe size i.