[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028753#comment-14028753 ]
Teng Yutong commented on HIVE-6584: ----------------------------------- hi nick, i have some concerns about these patches: 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a direct subclass of InputSplit, not a subclass of tablesplit 4. there is no public setScan method in TableSnapshotInputFormat.RecordReader, instead it will translate a string into a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan. So I suggest adding a subclass of HBaseStorageHandler(and other necessary classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot situation. In fact, I have already finished the necessary code changes and done some tests. The tests show that my modification works out. i will upload my patch soon > Add HiveHBaseTableSnapshotInputFormat > ------------------------------------- > > Key: HIVE-6584 > URL: https://issues.apache.org/jira/browse/HIVE-6584 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > Fix For: 0.14.0 > > Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, > HIVE-6584.3.patch > > > HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. > This allows a MR job to consume a stable, read-only view of an HBase table > directly off of HDFS. Bypassing the online region server API provides a nice > performance boost for the full scan. HBASE-10642 is backporting that feature > to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's > available, we should add an input format. A follow-on patch could work out > how to integrate this functionality into the StorageHandler, similar to how > HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)