[jira] [Commented] (HIVE-2365) SQL support for bulk load into HBase

John Sichi (JIRA) Mon, 15 Aug 2011 18:44:54 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085477#comment-13085477
 ]


John Sichi commented on HIVE-2365:
----------------------------------

(Just realized I forgot to link the original doc where "as simple as this" is 
mentioned.)

https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

This issue pertains to INSERT of large amounts of data into HBase from Hive 
(not CREATE; I'll follow up separately in HIVE-2373).

The major challenges here are:

* automating the sampling needed for coming up with the range partitioning for 
the global sort
* extending Hive's INSERT to express the whole thing
* chaining together the sampling job with the actual load job and tying 
together the relevant bits such as temporary file locations (we've had success 
doing something similar via reentrant SQL for index load/query statements)
* making the load use the HBase bulk load API which was added subsequent to the 
original Hive work


> SQL support for bulk load into HBase
> ------------------------------------
>
>                 Key: HIVE-2365
>                 URL: https://issues.apache.org/jira/browse/HIVE-2365
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>            Reporter: John Sichi
>
> Support the "as simple as this" SQL for bulk load from Hive into HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2365) SQL support for bulk load into HBase

Reply via email to