[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

Edward Capriolo (JIRA) Mon, 18 Nov 2013 20:36:48 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826191#comment-13826191
 ]


Edward Capriolo commented on HIVE-5317:
---------------------------------------

{quote}
Ed,
If you don't use the insert, update, and delete commands, they won't impact 
your use of Hive. On the other hand, there are a wide number of users who need 
ACID and updates.
{quote}

Why don't those users just use an acid database?

{quote}
The dimension tables have primary keys and are typically bucketed and sorted on 
those keys.
{quote}

All the use cases defined seem to be exactly what hive is not built for.
1) Hive does not do much/any optimization of a table when it is sorted.
2) Hive tables do not have primary keys
3) Hive is not made to play with tables of only a few rows

It seems like the idea is to turn hive and hive metastore into a once shot 
database for processes that can easily be done differently. 

{quote}
Once a day a small set (up to 100k rows) of records need to be deleted for 
regulatory compliance.
{quote}
1. squoop export to rdbms
2. run query on rdbms
3. write back to hive.

I am not ready to vote -1, but I am struggling to understand why anyone would 
want to use hive to solve the use cases described. This seems like a square peg 
in a round hole solution. It feels like something that belongs outside of hive.

It feels a lot like this:
http://db.cs.yale.edu/hadoopdb/hadoopdb.html


 





> Implement insert, update, and delete in Hive with full ACID support
> -------------------------------------------------------------------
>
>                 Key: HIVE-5317
>                 URL: https://issues.apache.org/jira/browse/HIVE-5317
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: InsertUpdatesinHive.pdf
>
>
> Many customers want to be able to insert, update and delete rows from Hive 
> tables with full ACID support. The use cases are varied, but the form of the 
> queries that should be supported are:
> * INSERT INTO tbl SELECT …
> * INSERT INTO tbl VALUES ...
> * UPDATE tbl SET … WHERE …
> * DELETE FROM tbl WHERE …
> * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
> ...
> * SET TRANSACTION LEVEL …
> * BEGIN/END TRANSACTION
> Use Cases
> * Once an hour, a set of inserts and updates (up to 500k rows) for various 
> dimension tables (eg. customer, inventory, stores) needs to be processed. The 
> dimension tables have primary keys and are typically bucketed and sorted on 
> those keys.
> * Once a day a small set (up to 100k rows) of records need to be deleted for 
> regulatory compliance.
> * Once an hour a log of transactions is exported from a RDBS and the fact 
> tables need to be updated (up to 1m rows)  to reflect the new data. The 
> transactions are a combination of inserts, updates, and deletes. The table is 
> partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

Reply via email to