>>On Wed, May 15, 2013 at 3:38 AM, Peter Marron 
>><peter.mar...@trilliumsoftware.com> wrote:
…
>I've started doing similar work for the ORC reader.

I guess that I’m glad that I’m not completely alone here.

>>
>>Firstly although that page mentions InputFormat there doesn’t seem to be any 
>>way (that I can find)
>>to perform filter passing to InputFormats and so I gave up on that approach.
>>
>There is. You just need to set  hive.optimize.index.filter to true. See 
>https://issues.apache.org/jira/browse/HIVE-4242.

This is a little confusing. When I look through the code for the use of this 
configuration
I see that it’s effectively used in two places.
Firstly it’s used on line 55 of file PhysicalOptimizer.java to add a 
“IndexWhereResolver”
Secondly it’s used on line 766 of file OpProcFactory.java to set a filter 
expression

But I don’t see any point where the predicate is passed to the InputFormat 
class.
I guess that you’re saying that there’s some way that the InputFormat can 
retrieve the
predicate once it’s been stored. But it’s not clear to me how I do that.

>>
>>That said, we really need to create a better interface that allows 
>>inputformats to negotiate what parts of the predicate they can process.

Ah, yes, sorry. I really want to be able to remove part of the predicate and 
subsume the filtering into the InputFormat class.
There’s little point in me going down this route if I can’t do that.

>>
>>-- Owen
>>

Thanks for prodding me into looking at the code, because now I see a big 
problem.

To recap what I really want to do is to be able to effect filtering on the case 
where I do a
                select * from table;
query. This is the only query that I’m interested in because it seems to run 
without any
Map/Reduce overhead (either locally or in the cluster) it’s effectively just 
performing
some HDFS calls and that’s what I desire.

What I really want to be able to do is to issue a query like this:
                select * from table where <predicate>
where I filter out the predicate and do the filtering in the InputFormat and 
then hive
effectively sees the query
                select * from table;
and runs it directly (no Map/Reduce) and I’m a happy bunny.

Now, as I say, I can’t see any way to effect this in the InputFormat directly.
If I use a storage handler then I am in “non-native table” terrority and I
can’t LOAD my tables with data.

However I have just noticed that line 111 of file IndexWhereProcessor.java
seems to suggest that indexes are only ever user when the query is going
to run Map/Reduce. Is this so? So I seem to be in the position where I
can’t use InputFormat, StorageHandler or Indexes. What can I do?

Is there any way to filter the query without having to run Map/Reduce?

Any suggestions welcomed.

Peter Marron
Trillium Software UK Limited

Tel : +44 (0) 118 940 7609
Fax : +44 (0) 118 940 7699
E: peter.mar...@trilliumsoftware.com<mailto:roy.willi...@trilliumsoftware.com>

Reply via email to