Re: ACID ORC file reader issue with uncompacted data

2015-05-18 Thread Elliot West
Thanks for the reply Alan. I see your point regarding the multiple delta directories and why it would not make sense to include one of them as a leaf. However, it seems that with this scheme one cannot abstractly work with such paths. One must have knowledge of the underlying format to understand

Re: ACID ORC file reader issue with uncompacted data

2015-05-14 Thread Alan Gates
Ok, I think I understand now. I also get why OrcSplit.getPath returns just up to the partition keys and not the delta directories. In most cases there will be more than one delta directory, so which one would it pick? It seems you already know the file type you are working on before you cal

Re: ACID ORC file reader issue with uncompacted data

2015-05-01 Thread Elliot West
Yes and no :-) We're initially using OrcFile.createReader to create a Reader so that we can obtain the schema (StructTypeInfo) from the file. I don't believe this is possible with OrcInputFormat.getReader(?): Reader orcReader = OrcFile.createReader(path, OrcFile.readerOptions(conf)); ObjectInspec

Re: ACID ORC file reader issue with uncompacted data

2015-04-30 Thread Alan Gates
Are you using OrcInputFormat.getReader to get a reader? If so, it should take care of these anomalies for you and mask your need to worry about delta versus base files. Alan. Elliot West April 29, 2015 at 9:40 Hi, I'm implementing a tap to read Hive ORC ACID date i

Re: ACID ORC file reader issue with uncompacted data

2015-04-29 Thread Eugene Koifman
day, April 29, 2015 at 9:40 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: ACID ORC file reader issue with uncompacted data Hi, I'm implementing a tap to read Hive ORC ACID date into Cascading jobs and I've hit a c

Re: ACID ORC file reader issue with uncompacted data

2015-04-29 Thread Elliot West
ore 1st compaction is definitely a valid use case. > > From: Elliot West > Reply-To: "user@hive.apache.org" > Date: Wednesday, April 29, 2015 at 9:40 AM > To: "user@hive.apache.org" > Subject: ACID ORC file reader issue with uncompacted data > >

Re: ACID ORC file reader issue with uncompacted data

2015-04-29 Thread Eugene Koifman
;user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: ACID ORC file reader issue with uncompacted data Hi, I'm implementing a tap to read Hive ORC ACID date into Cascading jobs and I've hit a couple of issues for a particular sce

ACID ORC file reader issue with uncompacted data

2015-04-29 Thread Elliot West
Hi, I'm implementing a tap to read Hive ORC ACID date into Cascading jobs and I've hit a couple of issues for a particular scenario. The case I have is when data has been written into a transactional table and a compaction has not yet occurred. This can be recreated like so: CREATE TABLE test_tab