On Thu, 2008-04-17 at 07:31 +0100, Simon Riggs wrote: > > > * MERGE will perform a left outer join between source on left and target > > > on right. There must be no more than 1 row from table-ref for each row > > > in the table. Each row in the table can only be updated once during each > > > MERGE statement. Each non-matching row in the table-ref will result in > > > one INSERT into table. > > > > > > * WHEN clauses are tested in the order specified. If the AND condition > > > returns false then we skip onto the next WHEN clause. We stop once a > > > WHEN clause activates, so only *one* action is ever activated for each > > > row. > > > > > > * AND clauses need not form a complete set, i.e. it is possible that no > > > action will result. It is also possible that some WHEN clauses will > > > never activate because of the execution order; we would not try to > > > prevent this, just document it as a possible user error. > > > > > > > Just curious if any of these behaviors come from the spec? or maybe from > > other databases? they don't seem unreasonable in general though. > > First two come from spec following clarifications from other > implementations. The last point about the AND clauses not necessarily > covering 100% of cases follows from the other points.
No, it looks like I missed one line in the standard. I've spent all day analysing this and writing up test cases, so this next bit isn't really a reply to you Robert, just lots of additional detail on this particular area. In summary, the standard requires us to 1. If MATCHED to more than one source row we are supposed to throw an error "cardinality violation". However, if there is no target row, yet would have generated multiple rows had there been one, we do *not* throw an error. 2. If the source row doesn't match any target row AND there is no WHEN NOT MATCHING clause that applies, we are then supposed to implicitly perform an INSERT with DEFAULT VALUES. Neither of these seem particularly useful behaviours in *all* cases. For 1, it implies we would need to sort and de-duplicate the output so that we can throw an error in *all* cases. We would need to do this because its fairly hard to determine that the set formed from the union of all WHEN clauses does not cover the universal set. (i.e. its possible to write a statement that doesn't have a WHEN clause that applies). If we try to update a row that just got updated we need to throw an error, otherwise we may allow the Update Halloween problem. So we will effectively detect this error in most cases anyway. So for 1, I suggest we don't throw the explicit error as mentioned in the standard and allow the secondary update error to provide 99% of required errors. In the case where there was a matching row but no WHEN clause tat applies, we simply ignore that matching row. For 2, DB2 allows an additional statement ELSE IGNORE. I suggest we support the standard for (2), yet also provide an additional option WHEN [NOT] MATCHING THEN DO NOTHING which would simply drop the matching or non-matching row and continue with the next tuple. My first attempt at a MERGE test case, with simulated output (no, I haven't written it yet) is attached. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
-- -- MERGE -- CREATE TABLE target (id integer, balance integer); CREATE TABLE source (id integer, balance integer); INSERT INTO target VALUES (1, 10); INSERT INTO target VALUES (2, 20); INSERT INTO target VALUES (3, 30); SELECT * FROM target; id | balance ----+--------- 1 | 10 2 | 20 3 | 30 (3 rows) -- -- initial tests -- -- empty source means 0 rows touched MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED THEN UPDATE SET balance = t.balance + s.balance ; -- insert some source rows to work from INSERT INTO source VALUES (2, 5); INSERT INTO source VALUES (3, 20); INSERT INTO source VALUES (4, 40); SELECT * FROM source; id | balance ----+--------- 2 | 5 3 | 20 4 | 40 (3 rows) -- do a simple equivalent of an UPDATE join BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED THEN UPDATE SET balance = t.balance + s.balance ; SELECT * FROM target; id | balance ----+--------- 1 | 10 2 | 25 3 | 50 (3 rows) ROLLBACK; -- do a simple equivalent of an INSERT SELECT BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; SELECT * FROM target; id | balance ----+--------- 1 | 10 2 | 20 3 | 30 4 | 40 (4 rows) ROLLBACK; -- now the classic UPSERT BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED THEN UPDATE SET balance = t.balance + s.balance WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; SELECT * FROM target; id | balance ----+--------- 1 | 10 2 | 25 3 | 50 4 | 40 (4 rows) ROLLBACK; -- -- Non-standard functionality -- -- do a simple equivalent of a DELETE join BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED THEN DELETE ; SELECT * FROM target; id | balance ----+--------- 1 | 10 (1 row) ROLLBACK; -- now the classic UPSERT, with a DELETE -- the Standard doesn't allow the DELETE clause for some reason, -- though other implementations do BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED AND s.balance > 10 THEN UPDATE SET balance = t.balance + s.balance WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; SELECT * FROM target; id | balance ----+--------- 1 | 10 3 | 50 4 | 40 (3 rows) ROLLBACK; -- Prepare the test data to generate multiple matching rows for a single target INSERT INTO source VALUES (3, 5); SELECT * FROM source ORDER BY id, balance; id | balance ----+--------- 2 | 5 3 | 5 3 | 20 4 | 40 (4 rows) -- we now have a duplicate key in source, so when we join to -- target we will generate 2 matching rows, not one -- In the following statement row id=3 will be both updated -- and deleted by this statement and so will cause a run-time error -- when the second change to that row is detected -- This next SQL statement -- fails according to standard -- fails in PostgreSQL implementation BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED AND s.balance > 10 THEN UPDATE SET balance = t.balance + s.balance WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; ERROR: multiple actions on single target row ROLLBACK; -- This next SQL statement -- fails according to standard -- suceeds in PostgreSQL implementation by simply ignoring the second -- matching row since it activates no WHEN clause BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED AND s.balance > 10 THEN UPDATE SET balance = t.balance + s.balance WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; ROLLBACK; -- Now lets prepare the test data to generate 2 non-matching rows DELETE FROM source WHERE id = 3 AND balance = 5; INSERT INTO source VALUES (4, 5); SELECT * FROM source; id | balance ----+--------- 2 | 5 3 | 20 4 | 5 4 | 40 (4 rows) -- This next SQL statement -- suceeds according to standard (yes, it is inconsistent) -- suceeds in PostgreSQL implementation, though could easily fail if -- there was an appropriate unique constraint BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; SELECT * FROM target; id | balance ----+--------- 1 | 10 2 | 20 3 | 30 4 | 5 4 | 40 (5 rows) ROLLBACK; -- This next SQL statement works, but since there is no WHEN clause that -- applies to non-matching rows, SQL standard requires us to generate -- rows with DEFAULT VALUES for all columns, which is why we support the -- syntax DO NOTHING (similar to the way Rules work) in addition BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN NOT MATCHED AND s.balance > 100 THEN INSERT VALUES (s.id, s.balance) ; SELECT * FROM target; id | balance ----+--------- 1 | 10 2 | 20 3 | 30 | | (5 rows) ROLLBACK; -- This next SQL statement suceeds, but does nothing since there are -- only non-matching rows that do not activate a WHEN clause, so we -- provide syntax to just ignore them, rather than allowing data quality -- problems BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN NOT MATCHED AND s.balance > 100 THEN INSERT VALUES (s.id, s.balance) WHEN NOT MATCHED DO NOTHING ; SELECT * FROM target; id | balance ----+--------- 1 | 10 2 | 20 3 | 30 (3 rows) ROLLBACK; -- -- Weirdness -- -- MERGE statement containing WHEN clauses that are never executable -- NOT an error under the standard MERGE into target t USING (select * from source) AS s ON t.id = s.id WHEN MATCHED AND s.balance > 0 THEN UPDATE SET balance = t.balance + s.balance WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) WHEN NOT MATCHED THEN /* never executed because of order of WHEN clauses */ INSERT VALUES (s.id, s.balance + 10) WHEN MATCHED THEN /* never executed because of order of WHEN clauses */ UPDATE SET balance = t.balance + s.balance ;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers