[ 
https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115018#comment-13115018
 ] 

Sandy Pratt commented on HIVE-1643:
-----------------------------------

I've been working on this issue myself the last few days, and I was hoping we 
could compare notes.  I’m developing against the hive release we currently have 
deployed (hive-0.7.0-cdh3u0), so I’m not sure if a patch would make sense.  But 
basically, here’s the gist:

HBaseStorageHandler.java:
-       Strike the implementation of HivePredicateStorageHanlder, and implement 
HiveStorageHandler instead.  We’ll still be passed a copy of the filter 
expression if we wish to optimize, but Hive will still do the filtering over 
top of us (which is fine).

HiveHBaseTableInputFormat.java:
-       Change the IndexPredicateAnalyzer configuration to include a few more 
operations that are relevant to range scans (e.g. >=, <=, <, >).
-       Continue to set the start and stop row basically as before in 
convertFilter, while doing the right thing for range scans.

My main concern with this approach is that the IndexPredicateAnalyzer might not 
be doing what I think.  For example, if I write odd where clauses like “key >= 
b and (key < c or true)” is it smart enough to discard that vacuous second 
part?  Also, there are HBase-specific edge cases to think about (e.g. if we get 
“key >= d and key <= b” then we shouldn’t set any startRow or stopRow). 

Does this sound like the right track?

                
> support range scans and non-key columns in HBase filter pushdown
> ----------------------------------------------------------------
>
>                 Key: HIVE-1643
>                 URL: https://issues.apache.org/jira/browse/HIVE-1643
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: John Sichi
>            Assignee: Vaibhav Aggarwal
>
> HIVE-1226 added support for WHERE rowkey=3.  We would like to support WHERE 
> rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus 
> conjunctions etc).  Non-rowkey conditions can't be used to filter out entire 
> ranges, but they can be used to push the per-row filter processing as far 
> down as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to