>>On Wed, May 15, 2013 at 3:38 AM, Peter Marron >><peter.mar...@trilliumsoftware.com> wrote: … >I've started doing similar work for the ORC reader.
I guess that I’m glad that I’m not completely alone here. >> >>Firstly although that page mentions InputFormat there doesn’t seem to be any >>way (that I can find) >>to perform filter passing to InputFormats and so I gave up on that approach. >> >There is. You just need to set hive.optimize.index.filter to true. See >https://issues.apache.org/jira/browse/HIVE-4242. This is a little confusing. When I look through the code for the use of this configuration I see that it’s effectively used in two places. Firstly it’s used on line 55 of file PhysicalOptimizer.java to add a “IndexWhereResolver” Secondly it’s used on line 766 of file OpProcFactory.java to set a filter expression But I don’t see any point where the predicate is passed to the InputFormat class. I guess that you’re saying that there’s some way that the InputFormat can retrieve the predicate once it’s been stored. But it’s not clear to me how I do that. >> >>That said, we really need to create a better interface that allows >>inputformats to negotiate what parts of the predicate they can process. Ah, yes, sorry. I really want to be able to remove part of the predicate and subsume the filtering into the InputFormat class. There’s little point in me going down this route if I can’t do that. >> >>-- Owen >> Thanks for prodding me into looking at the code, because now I see a big problem. To recap what I really want to do is to be able to effect filtering on the case where I do a select * from table; query. This is the only query that I’m interested in because it seems to run without any Map/Reduce overhead (either locally or in the cluster) it’s effectively just performing some HDFS calls and that’s what I desire. What I really want to be able to do is to issue a query like this: select * from table where <predicate> where I filter out the predicate and do the filtering in the InputFormat and then hive effectively sees the query select * from table; and runs it directly (no Map/Reduce) and I’m a happy bunny. Now, as I say, I can’t see any way to effect this in the InputFormat directly. If I use a storage handler then I am in “non-native table” terrority and I can’t LOAD my tables with data. However I have just noticed that line 111 of file IndexWhereProcessor.java seems to suggest that indexes are only ever user when the query is going to run Map/Reduce. Is this so? So I seem to be in the position where I can’t use InputFormat, StorageHandler or Indexes. What can I do? Is there any way to filter the query without having to run Map/Reduce? Any suggestions welcomed. Peter Marron Trillium Software UK Limited Tel : +44 (0) 118 940 7609 Fax : +44 (0) 118 940 7699 E: peter.mar...@trilliumsoftware.com<mailto:roy.willi...@trilliumsoftware.com>