[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826191#comment-13826191 ]
Edward Capriolo commented on HIVE-5317: --------------------------------------- {quote} Ed, If you don't use the insert, update, and delete commands, they won't impact your use of Hive. On the other hand, there are a wide number of users who need ACID and updates. {quote} Why don't those users just use an acid database? {quote} The dimension tables have primary keys and are typically bucketed and sorted on those keys. {quote} All the use cases defined seem to be exactly what hive is not built for. 1) Hive does not do much/any optimization of a table when it is sorted. 2) Hive tables do not have primary keys 3) Hive is not made to play with tables of only a few rows It seems like the idea is to turn hive and hive metastore into a once shot database for processes that can easily be done differently. {quote} Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. {quote} 1. squoop export to rdbms 2. run query on rdbms 3. write back to hive. I am not ready to vote -1, but I am struggling to understand why anyone would want to use hive to solve the use cases described. This seems like a square peg in a round hole solution. It feels like something that belongs outside of hive. It feels a lot like this: http://db.cs.yale.edu/hadoopdb/hadoopdb.html > Implement insert, update, and delete in Hive with full ACID support > ------------------------------------------------------------------- > > Key: HIVE-5317 > URL: https://issues.apache.org/jira/browse/HIVE-5317 > Project: Hive > Issue Type: New Feature > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: InsertUpdatesinHive.pdf > > > Many customers want to be able to insert, update and delete rows from Hive > tables with full ACID support. The use cases are varied, but the form of the > queries that should be supported are: > * INSERT INTO tbl SELECT … > * INSERT INTO tbl VALUES ... > * UPDATE tbl SET … WHERE … > * DELETE FROM tbl WHERE … > * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN > ... > * SET TRANSACTION LEVEL … > * BEGIN/END TRANSACTION > Use Cases > * Once an hour, a set of inserts and updates (up to 500k rows) for various > dimension tables (eg. customer, inventory, stores) needs to be processed. The > dimension tables have primary keys and are typically bucketed and sorted on > those keys. > * Once a day a small set (up to 100k rows) of records need to be deleted for > regulatory compliance. > * Once an hour a log of transactions is exported from a RDBS and the fact > tables need to be updated (up to 1m rows) to reflect the new data. The > transactions are a combination of inserts, updates, and deletes. The table is > partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.1#6144)