[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

zjkyly (JIRA) Tue, 17 Jun 2014 02:22:28 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033591#comment-14033591
 ]


zjkyly commented on HIVE-6584:
------------------------------

Teng YuTong and I are colleagues. we have a patch for HIVE-6584 and a patch for 
HBASE-11163 ,   and we modify
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat（line 93）
from: static class TableSnapshotRegionSplit extends InputSplit implements 
Writable
to: public static class TableSnapshotRegionSplit extends InputSplit implements 
Writable 

we can run mapred on snapshot. mapred (count(1)) result :

2014-06-17 16:29:34,540 Stage-1 map = 100%,  reduce = 32%, Cumulative CPU 
2467.57 sec
2014-06-17 16:29:35,578 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
2468.35 sec
MapReduce Total cumulative CPU time: 41 minutes 8 seconds 350 msec
Ended Job = job_1402970116480_0015
MapReduce Jobs Launched: 
Job 0: Map: 64  Reduce: 1   Cumulative CPU: 2468.35 sec   HDFS Read: 18334 HDFS 
Write: 9 SUCCESS
Total MapReduce CPU Time Spent: 41 minutes 8 seconds 350 msec
OK
65497163
Time taken: 429.647 seconds, Fetched: 1 row(s)

hbase count result:
Current count: 65400000, row: user987684650651905350                            
                                                                                
            
65497163 row(s) in 1446.2310 seconds
=> 65497163

but hfile has different versions of the record. We can not solve this problem. 
So, we set  the version of hbase table is 1, and run major compact before 
snapshot table.

> Add HiveHBaseTableSnapshotInputFormat
> -------------------------------------
>
>                 Key: HIVE-6584
>                 URL: https://issues.apache.org/jira/browse/HIVE-6584
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
> HIVE-6584.3.patch, HIVE-6584.4.patch
>
>
> HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
> This allows a MR job to consume a stable, read-only view of an HBase table 
> directly off of HDFS. Bypassing the online region server API provides a nice 
> performance boost for the full scan. HBASE-10642 is backporting that feature 
> to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
> available, we should add an input format. A follow-on patch could work out 
> how to integrate this functionality into the StorageHandler, similar to how 
> HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

Reply via email to