Hello all, On Tue, Mar 6, 2012 at 8:31 PM, Namit Jain (Created) (JIRA) <j...@apache.org > wrote:
> Add support for index joins in Hive > ----------------------------------- > > Key: HIVE-2845 > URL: https://issues.apache.org/jira/browse/HIVE-2845 > Project: Hive > Issue Type: New Feature > Reporter: Namit Jain > > > Hive supports indexes, which are used for filters currently. > > It would be very useful to add support for index-based joins in Hive. > If 2 tables A and B are being joined, and an index exists on the join key > of A, > B can be scanned (by the mappers), and for each row in B, a lookup for the > corresponding row in A can be performed. According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins, only the last table which is streamed could be scanned by an index which is in this case B. Please correct me if I'm wrong. This can be very useful for some usecases. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > > The process may be re-writing the original query as an added-stage in physical optimizer, but would not produce any different MapReduce job like the ones that HiveSkewJoin does in the physical optimizer. If this is effective, how would that query rewriting process be? Should it match with a "JOIN" rule, like HiveSkewJoin, and then replace the second "TS"? How? I am eager to implement this issue an I was wondering if it could be assigned to me. I appreciate any hints/clues in advance. Regards, Mahsa