[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029497#comment-14029497
 ] 

Nick Dimiduk commented on HIVE-6584:
------------------------------------

Thanks for the insightful comments, [~tenggyut].

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned 
inputformat will always be HiveHBaseTabelInputFormat (at least according to my 
test)

I was afraid of this in my initial design thinking, but my experiments proved 
otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if 
I'm able.

bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check 
whether the HBase table exist or not, regardless the external table that hive 
gonna create is based on actual table or a snapshot.

I haven't yet looked at the use-case of consuming a snapshot for which there is 
no table in HBase. I planned to approach this kind of feature in follow-on 
work; the goal here is to get jus the basics working.

bq. 3, 4 [snip]

These are both true.

bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary 
classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot 
situation.

A goal of this patch is to be able to query snapshots created from online 
tables already registered with Hive using the HBaseStorageHandler. Implementing 
HBaseSnapshotStorageHandler requires a separate table registration for the 
snapshot. I think that's undesirable. Regarding the "hbase snapshot situation", 
let's make it better on the HBase side. What do you recommend?

> Add HiveHBaseTableSnapshotInputFormat
> -------------------------------------
>
>                 Key: HIVE-6584
>                 URL: https://issues.apache.org/jira/browse/HIVE-6584
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
> HIVE-6584.3.patch
>
>
> HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
> This allows a MR job to consume a stable, read-only view of an HBase table 
> directly off of HDFS. Bypassing the online region server API provides a nice 
> performance boost for the full scan. HBASE-10642 is backporting that feature 
> to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
> available, we should add an input format. A follow-on patch could work out 
> how to integrate this functionality into the StorageHandler, similar to how 
> HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to