[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033591#comment-14033591 ]
zjkyly commented on HIVE-6584: ------------------------------ Teng YuTong and I are colleagues. we have a patch for HIVE-6584 and a patch for HBASE-11163 , and we modify org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat(line 93) from: static class TableSnapshotRegionSplit extends InputSplit implements Writable to: public static class TableSnapshotRegionSplit extends InputSplit implements Writable we can run mapred on snapshot. mapred (count(1)) result : 2014-06-17 16:29:34,540 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 2467.57 sec 2014-06-17 16:29:35,578 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2468.35 sec MapReduce Total cumulative CPU time: 41 minutes 8 seconds 350 msec Ended Job = job_1402970116480_0015 MapReduce Jobs Launched: Job 0: Map: 64 Reduce: 1 Cumulative CPU: 2468.35 sec HDFS Read: 18334 HDFS Write: 9 SUCCESS Total MapReduce CPU Time Spent: 41 minutes 8 seconds 350 msec OK 65497163 Time taken: 429.647 seconds, Fetched: 1 row(s) hbase count result: Current count: 65400000, row: user987684650651905350 65497163 row(s) in 1446.2310 seconds => 65497163 but hfile has different versions of the record. We can not solve this problem. So, we set the version of hbase table is 1, and run major compact before snapshot table. > Add HiveHBaseTableSnapshotInputFormat > ------------------------------------- > > Key: HIVE-6584 > URL: https://issues.apache.org/jira/browse/HIVE-6584 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > Fix For: 0.14.0 > > Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, > HIVE-6584.3.patch, HIVE-6584.4.patch > > > HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. > This allows a MR job to consume a stable, read-only view of an HBase table > directly off of HDFS. Bypassing the online region server API provides a nice > performance boost for the full scan. HBASE-10642 is backporting that feature > to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's > available, we should add an input format. A follow-on patch could work out > how to integrate this functionality into the StorageHandler, similar to how > HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)