[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066930#comment-14066930 ]
Nick Dimiduk commented on HIVE-6584: ------------------------------------ [~tenggyut]: bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) My patch has the logic necessary to perform the switch at runtime. It does indeed work with the latest patch. bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. I'm not sure about this. Anyway that's not related to this feature. HBaseStorageHandler has no means of creating/dropping table snapshots. If you're seeing some issue here with StorageHandler DDL operations, please file a separate JIRA. bq. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a direct subclass of InputSplit, not a subclass of tablesplit Nor should it be. The TableSnapshotRegionSplit is tracking different information from TableSplit. bq. 4. there is no public setScan method in TableSnapshotInputFormat.RecordReader, instead it will translate a string into a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan. Indeed, there is disparity between the HBase's mapred and mapreduce implementations. I opened HBASE-11179 for some cleanup on the HBase side. convertStringToScan details are HBase-private API as of 0.96. I opened HBASE-11163 to make necessary scanner support available in mapred API, but it's not yet been implemented. > Add HiveHBaseTableSnapshotInputFormat > ------------------------------------- > > Key: HIVE-6584 > URL: https://issues.apache.org/jira/browse/HIVE-6584 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > Fix For: 0.14.0 > > Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, > HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, > HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch > > > HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. > This allows a MR job to consume a stable, read-only view of an HBase table > directly off of HDFS. Bypassing the online region server API provides a nice > performance boost for the full scan. HBASE-10642 is backporting that feature > to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's > available, we should add an input format. A follow-on patch could work out > how to integrate this functionality into the StorageHandler, similar to how > HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)