[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086306#comment-14086306 ]
Carter Shanklin commented on HIVE-6584: --------------------------------------- I had to read the source code to get it to work, so I vote yes. Using Hive over HBase snapshots requires 2 variables to be set, hive.hbase.snapshot.name - The name of the HBase snapshot to be used when reading the HBase data. hive.hbase.snapshot.restoredir - A temporary directory into which the hbase snapshot is restored when queried using hive.hbase.snapshot.name. A number of directories and small files will be created under this directory, proportional to the number of regions in the HBase table. The table data itself will not be copied under this directory, only metadata. After query execution is complete, this directory can be removed. Example: set hive.hbase.snapshot.name=snapshot_2014_08_03; set hive.hbase.snapshot.restoredir=/tmp/restore select count(*) from hbase_table; After the job is complete, /tmp/restore and its subdirectories can be deleted. [~ndimiduk] talked about making hive.hbase.snapshot.restoredir an optional setting, he can comment whether he implemented this or not. > Add HiveHBaseTableSnapshotInputFormat > ------------------------------------- > > Key: HIVE-6584 > URL: https://issues.apache.org/jira/browse/HIVE-6584 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > Fix For: 0.14.0 > > Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, > HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, > HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, > HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, > HIVE-6584.8.patch, HIVE-6584.9.patch > > > HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. > This allows a MR job to consume a stable, read-only view of an HBase table > directly off of HDFS. Bypassing the online region server API provides a nice > performance boost for the full scan. HBASE-10642 is backporting that feature > to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's > available, we should add an input format. A follow-on patch could work out > how to integrate this functionality into the StorageHandler, similar to how > HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)