Prashanth, Apologies for the delay in response.
Below is the orcfiledump of the empty orc file from a broken partition. *$ hive --orcfiledump /hive/*testdb*.db/*table_orc */year=2016/month=1/day=29/000000_0* *Structure for /hive/*testdb*.db/*table_orc */year=2016/month=1/day=29/000000_0* *File Version: 0.12 with HIVE_8732* *16/03/25 17:49:09 INFO orc.ReaderImpl: Reading ORC rows from /hive/*testdb *.db/*table_orc*/year=2016/month=1/day=29/000000_0 with {include: null, offset: 0, length: 9223372036854775807}* *16/03/25 17:49:09 INFO orc.RecordReaderFactory: Schema is not specified on read. Using file schema.* *Rows: 0* *Compression: SNAPPY* *Compression size: 262144* *Type: struct<>* *Stripe Statistics:* *File Statistics:* * Column 0: count: 0 hasNull: false* *Stripes:* *File length: 49 bytes* *Padding length: 0 bytes* *Padding ratio: 0%* *$ * I still not able to figure it out whats causing this odd behaviour? Regards Biswa On Thu, Mar 10, 2016 at 3:12 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Alternatively you can send orcfiledump output for the empty orc file from > broken partition. > > Thanks > Prasanth > > On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran < > pjayachand...@hortonworks.com> wrote: > > Could you attach the emtpy orc files from one of the broken partition > somewhere? I can run some tests on it to see why its happening. > > Thanks > Prasanth > > On Mar 8, 2016, at 12:02 AM, Biswajit Nayak <biswa...@altiscale.com> > wrote: > > Both the parameters are set to false by default. > > *hive> set hive.optimize.index.filter;* > *hive.optimize.index.filter=false* > *hive> set hive.orc.splits.include.file.footer;* > *hive.orc.splits.include.file.footer=false* > *hive> * > > >>>I suspect this might be related to having 0 row files in the buckets > not > having any recorded schema. > > yes there are few files with 0 row, but the query works with other > partition (which has 0 row files). Out of 30 partition (for a month), 3-4 > partition are having this issue. Even reload of the data does not yield > anything. Query works fine in MR now, but having issue in tez. > > > > On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan <gop...@apache.org> > wrote: > >> >> > c varchar(2) >> ... >> > Num Buckets: 7 >> >> I suspect this might be related to having 0 row files in the buckets not >> having any recorded schema. >> >> You can also experiment with hive.optimize.index.filter=false, to see if >> the zero row case is artificially produced via predicate push-down. >> >> >> That shouldn't be a problem unless you've turned on >> hive.orc.splits.include.file.footer=true (recommended to be false). >> >> Your row-locations don't actually match any Apache source jar in my >> builds, are there any other patches to consider? >> >> Cheers, >> Gopal >> >> >> > > >