[ https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115018#comment-13115018 ]
Sandy Pratt commented on HIVE-1643: ----------------------------------- I've been working on this issue myself the last few days, and I was hoping we could compare notes. I’m developing against the hive release we currently have deployed (hive-0.7.0-cdh3u0), so I’m not sure if a patch would make sense. But basically, here’s the gist: HBaseStorageHandler.java: - Strike the implementation of HivePredicateStorageHanlder, and implement HiveStorageHandler instead. We’ll still be passed a copy of the filter expression if we wish to optimize, but Hive will still do the filtering over top of us (which is fine). HiveHBaseTableInputFormat.java: - Change the IndexPredicateAnalyzer configuration to include a few more operations that are relevant to range scans (e.g. >=, <=, <, >). - Continue to set the start and stop row basically as before in convertFilter, while doing the right thing for range scans. My main concern with this approach is that the IndexPredicateAnalyzer might not be doing what I think. For example, if I write odd where clauses like “key >= b and (key < c or true)” is it smart enough to discard that vacuous second part? Also, there are HBase-specific edge cases to think about (e.g. if we get “key >= d and key <= b” then we shouldn’t set any startRow or stopRow). Does this sound like the right track? > support range scans and non-key columns in HBase filter pushdown > ---------------------------------------------------------------- > > Key: HIVE-1643 > URL: https://issues.apache.org/jira/browse/HIVE-1643 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Affects Versions: 0.7.0 > Reporter: John Sichi > Assignee: Vaibhav Aggarwal > > HIVE-1226 added support for WHERE rowkey=3. We would like to support WHERE > rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus > conjunctions etc). Non-rowkey conditions can't be used to filter out entire > ranges, but they can be used to push the per-row filter processing as far > down as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira