[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029497#comment-14029497 ]
Nick Dimiduk commented on HIVE-6584: ------------------------------------ Thanks for the insightful comments, [~tenggyut]. bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) I was afraid of this in my initial design thinking, but my experiments proved otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if I'm able. bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. I haven't yet looked at the use-case of consuming a snapshot for which there is no table in HBase. I planned to approach this kind of feature in follow-on work; the goal here is to get jus the basics working. bq. 3, 4 [snip] These are both true. bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot situation. A goal of this patch is to be able to query snapshots created from online tables already registered with Hive using the HBaseStorageHandler. Implementing HBaseSnapshotStorageHandler requires a separate table registration for the snapshot. I think that's undesirable. Regarding the "hbase snapshot situation", let's make it better on the HBase side. What do you recommend? > Add HiveHBaseTableSnapshotInputFormat > ------------------------------------- > > Key: HIVE-6584 > URL: https://issues.apache.org/jira/browse/HIVE-6584 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > Fix For: 0.14.0 > > Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, > HIVE-6584.3.patch > > > HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. > This allows a MR job to consume a stable, read-only view of an HBase table > directly off of HDFS. Bypassing the online region server API provides a nice > performance boost for the full scan. HBASE-10642 is backporting that feature > to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's > available, we should add an input format. A follow-on patch could work out > how to integrate this functionality into the StorageHandler, similar to how > HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)