[ https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720334#comment-14720334 ]
Lefty Leverenz edited comment on HIVE-10924 at 8/30/15 3:37 AM: ---------------------------------------------------------------- h3. Feature design notes Hive supports [multi-insert statement|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries]. The idea is that you can execute a select statement and split the result stream into several to write to multiple targets. This matches very closely to what MERGE statement needs to do. When modeling MERGE as multi-insert, we'd split the stream into 2 stream, 1 for the insert part, 1 for update part but write both results to the same table. Section 14.12 of ISO/IEC 9075-2:2011(E) (SQL 2011) defines MERGE statement. Suppose we have tables {code:SQL} CREATE TABLE target(a int, b int, c int); CREATE TABLE source(x int, y int, z int); {code} Then an example that covers most possibilities might look like this {code:SQL} MERGE INTO target USING source ON b = y WHEN MATCHED AND c + 1 + z > 0 THEN UPDATE SET a = 1, c = z WHEN NOT MATCHED AND z IS NULL THEN INSERT(a,b) VALUES(z, 7) {code} \\ \\ And is interpreted as follows \\ \\ || Line || Statement Part || Notes || | 1 | {code:SQL} MERGE INTO target {code} | Specifies the table being modified | | 2 | {code:SQL} USING source {code} | specifies the source of the data which may be a table or expression such as SELECT … FROM … | | 3 | {code:SQL} ON b = y {code} | is interpreted like exactly like an ON clause of a JOIN between source and target. | | 4 | {code:SQL} WHEN MATCHED {code} | Applies if expr in ON is true | | 5 | {code:SQL} AND c + 1 + z > 0 {code} | Additional predicate to test before performing the action. | | 6 | {code:SQL} THEN UPDATE SET a = 1, c = z {code} | May be UPDATE or DELETE. The later deletes the row from target. SET clause is exactly like in regular UPDATE stmt. | | 7 | {code:SQL} WHEN NOT MATCHED {code} | Applies if expr in ON is false | | 8 | {code:SQL} AND z IS NULL {code} | Additional predicate to test before performing the action. | | 9 | {code:SQL} THEN INSERT(a,b) VALUES(z, 7){code} | Insert to perform on target. | \\ \\ Then the "equivalent" _multi-insert statement_ looks like this: \\ \\ || Statement Part || Refernce to previous table || | {code:SQL} FROM (SELECT * FROM target RIGHT OUTER JOIN SOURCE ON b = y) {code} | Lines 1 - 3 | | {code:SQL} INSERT INTO target(a,c) SELECT 1, z {code} | This represents the update part of merge; Line 6 | | {code:SQL} WHERE c + 1 + z > 0 {code} | Line 5 | | {code:SQL} AND b = y {code} | Only include ‘matched’ rows; Line 4 | | {code:SQL} INSERT INTO target(a,b) SELECT z, 7 {code} | This represents the ‘insert’ part of merge; Line 9 | | {code:SQL} WHERE z IS NULL {code} | Line 8 | | {code:SQL} AND a = null AND b = null AND c = null; {code} | Only include ‘not matched’ rows; Line 7 | h4. Some caveats # Current multi-insert doesn’t support writing to the same table more than once. Can we fix this? # This requires the same change as for multi-statement txn, that is to support multiple delta files per transaction. (HIVE-11030) # Requires annotating each insert (of multi-insert) with whether it’s doing update/delete or insert Since Hive can already compile an operator pipeline for such a _multi-insert statement_ (almost) support for MERGE doesn't require additional operators. Also, Update/Delete are actually compiled int Insert statements. was (Author: ekoifman): h3. Feature design notes Hive supports [multi-insert statement|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries]. The idea is that you can execute a select statement and split the result stream into several to write to multiple targets. This matches very closely to what MERGE statement needs to do. When modeling MERGE as multi-insert, we'd split the stream into 2 stream, 1 for the insert part, 1 for update part but write both results to the same table. Section 14.12 of ISO/IEC 9075-2:2011(E) (SQL 2011) defines MERGE statement. Suppose we have tables {code:SQL} CREATE TABLE target(a int, b int, c int); CREATE TABLE source(x int, y int, z int); {code} Then an example that covers most possibilities might look like this {code:SQL} MERGE INTO target USING source ON b = y WHEN MATCHED AND c + 1 + z > 0 THEN THEN UPDATE SET a = 1, c = z WHEN NOT MATCHED AND z IS NULL THEN INSERT(a,b) VALUES(z, 7) {code} \\ \\ And is interpreted as follows \\ \\ || Line || Statement Part || Notes || | 1 | {code:SQL} MERGE INTO target {code} | Specifies the table being modified | | 2 | {code:SQL} USING source {code} | specifies the source of the data which may be a table or expression such as SELECT … FROM … | | 3 | {code:SQL} ON b = y {code} | is interpreted like exactly like an ON clause of a JOIN between source and target. | | 4 | {code:SQL} WHEN MATCHED {code} | Applies if expr in ON is true | | 5 | {code:SQL} AND c + 1 + z > 0 {code} | Additional predicate to test before performing the action. | | 6 | {code:SQL} THEN UPDATE SET a = 1, c = z {code} | May be UPDATE or DELETE. The later deletes the row from target. SET clause is exactly like in regular UPDATE stmt. | | 7 | {code:SQL} WHEN NOT MATCHED {code} | Applies if expr in ON is false | | 8 | {code:SQL} AND z IS NULL {code} | Additional predicate to test before performing the action. | | 9 | {code:SQL} THEN INSERT(a,b) VALUES(z, 7){code} | Insert to perform on target. | \\ \\ Then the "equivalent" _multi-insert statement_ looks like this: \\ \\ || Statement Part || Refernce to previous table || | {code:SQL} FROM (SELECT * FROM target RIGHT OUTER JOIN SOURCE ON b = y) {code} | Lines 1 - 3 | | {code:SQL} INSERT INTO target(a,c) SELECT 1, z {code} | This represents the update part of merge; Line 6 | | {code:SQL} WHERE c + 1 + z > 0 {code} | Line 5 | | {code:SQL} AND b = y {code} | Only include ‘matched’ rows; Line 4 | | {code:SQL} INSERT INTO target(a,b) SELECT z, 7 {code} | This represents the ‘insert’ part of merge; Line 9 | | {code:SQL} WHERE z IS NULL {code} | Line 8 | | {code:SQL} AND a = null AND b = null AND c = null; {code} | Only include ‘not matched’ rows; Line 7 | h4. Some caveats # Current multi-insert doesn’t support writing to the same table more than once. Can we fix this? # This requires the same change as for multi-statement txn, that is to support multiple delta files per transaction. (HIVE-11030) # Requires annotating each insert (of multi-insert) with whether it’s doing update/delete or insert Since Hive can already compile an operator pipeline for such a _multi-insert statement_ (almost) support for MERGE doesn't require additional operators. Also, Update/Delete are actually compiled int Insert statements. > add support for MERGE statement > ------------------------------- > > Key: HIVE-10924 > URL: https://issues.apache.org/jira/browse/HIVE-10924 > Project: Hive > Issue Type: New Feature > Components: Query Planning, Query Processor, Transactions > Affects Versions: 1.2.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > > add support for > MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)