On Sun, Sep 5, 2010 at 12:27 AM, phil young <[email protected]> wrote: > I'm interested in doing joins in Hive between HBase tables and between HBase > and Hive tables. > > Can someone suggest an appropriate stack to do that? i.e. > Is it possible to use HBase 0.89 > If I use HBase 0.20.6, do I still need to apply HBASE-2473 > Should I go with the trunk versions of any of these (e.g. Hive), or even > CDH3 (which appears to not have the hive-hbase handler)? > > I'd appreciate input from anyone who has done this. > > Thanks >
You would define two tables in hive using the hbase storage handler. Here is the test case. http://svn.apache.org/viewvc/hadoop/hive/trunk/hbase-handler/src/test/queries/hbase_joins.q?revision=926818&view=markup >From there you should be able to join hbase to hbase , or hbase to hive I can not say what version or what patch level you need, but the code was committed recently so a recent version should go smooth. (I never tried it so I can not be sure) Remember that the hbase-storage-handler does full fable scans of your data so if you plan on querying large amounts of data your are not going to get low latency results (You probably already know that)
