Re: [HACKERS] MERGE SQL Statement

Simon Riggs Thu, 17 Apr 2008 07:14:18 -0700

On Thu, 2008-04-17 at 07:31 +0100, Simon Riggs wrote:

> > > * MERGE will perform a left outer join between source on left and target
> > > on right. There must be no more than 1 row from table-ref for each row
> > > in the table. Each row in the table can only be updated once during each
> > > MERGE statement. Each non-matching row in the table-ref will result in
> > > one INSERT into table.
> > >
> > > * WHEN clauses are tested in the order specified. If the AND condition
> > > returns false then we skip onto the next WHEN clause. We stop once a
> > > WHEN clause activates, so only *one* action is ever activated for each
> > > row.
> > >
> > > * AND clauses need not form a complete set, i.e. it is possible that no
> > > action will result. It is also possible that some WHEN clauses will
> > > never activate because of the execution order; we would not try to
> > > prevent this, just document it as a possible user error.
> > >
> > 
> > Just curious if any of these behaviors come from the spec?  or maybe from 
> > other databases?  they don't seem unreasonable in general though.  
> 
> First two come from spec following clarifications from other
> implementations. The last point about the AND clauses not necessarily
> covering 100% of cases follows from the other points.


No, it looks like I missed one line in the standard. 

I've spent all day analysing this and writing up test cases, so this
next bit isn't really a reply to you Robert, just lots of additional
detail on this particular area.


In summary, the standard requires us to

1. If MATCHED to more than one source row we are supposed to throw an
error "cardinality violation". However, if there is no target row, yet
would have generated multiple rows had there been one, we do *not* throw
an error.

2. If the source row doesn't match any target row AND there is no WHEN
NOT MATCHING clause that applies, we are then supposed to implicitly
perform an INSERT with DEFAULT VALUES.

Neither of these seem particularly useful behaviours in *all* cases.

For 1, it implies we would need to sort and de-duplicate the output so
that we can throw an error in *all* cases. We would need to do this
because its fairly hard to determine that the set formed from the union
of all WHEN clauses does not cover the universal set. (i.e. its possible
to write a statement that doesn't have a WHEN clause that applies). 

If we try to update a row that just got updated we need to throw an
error, otherwise we may allow the Update Halloween problem. So we will
effectively detect this error in most cases anyway.

So for 1, I suggest we don't throw the explicit error as mentioned in
the standard and allow the secondary update error to provide 99% of
required errors. In the case where there was a matching row but no WHEN
clause tat applies, we simply ignore that matching row.

For 2, DB2 allows an additional statement ELSE IGNORE. I suggest we
support the standard for (2), yet also provide an additional option
        WHEN [NOT] MATCHING THEN 
                DO NOTHING

which would simply drop the matching or non-matching row and continue
with the next tuple.

My first attempt at a MERGE test case, with simulated output (no, I
haven't written it yet) is attached.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

--
-- MERGE
--
CREATE TABLE target (id integer, balance integer);
CREATE TABLE source (id integer, balance integer);
INSERT INTO target VALUES (1, 10);
INSERT INTO target VALUES (2, 20);
INSERT INTO target VALUES (3, 30);
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
(3 rows)

--
-- initial tests
--
-- empty source means 0 rows touched
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	UPDATE SET balance = t.balance + s.balance
;
-- insert some source rows to work from
INSERT INTO source VALUES (2, 5);
INSERT INTO source VALUES (3, 20);
INSERT INTO source VALUES (4, 40);
SELECT * FROM source;
 id | balance 
----+---------
  2 |       5
  3 |      20
  4 |      40
(3 rows)

-- do a simple equivalent of an UPDATE join
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	UPDATE SET balance = t.balance + s.balance
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      25
  3 |      50
(3 rows)

ROLLBACK;
-- do a simple equivalent of an INSERT SELECT
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
  4 |      40
(4 rows)

ROLLBACK;
-- now the classic UPSERT
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	UPDATE SET balance = t.balance + s.balance
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      25
  3 |      50
  4 |      40
(4 rows)

ROLLBACK;
--
-- Non-standard functionality
-- 
-- do a simple equivalent of a DELETE join
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	DELETE
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
(1 row)

ROLLBACK;
-- now the classic UPSERT, with a DELETE
-- the Standard doesn't allow the DELETE clause for some reason,
-- though other implementations do
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 10 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN MATCHED THEN
	DELETE
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  3 |      50
  4 |      40
(3 rows)

ROLLBACK;
-- Prepare the test data to generate multiple matching rows for a single target
INSERT INTO source VALUES (3, 5);
SELECT * FROM source ORDER BY id, balance;
 id | balance 
----+---------
  2 |       5
  3 |       5
  3 |      20
  4 |      40
(4 rows)

-- we now have a duplicate key in source, so when we join to
-- target we will generate 2 matching rows, not one
-- In the following statement row id=3 will be both updated
-- and deleted by this statement and so will cause a run-time error
-- when the second change to that row is detected
-- This next SQL statement
--  fails according to standard
--  fails in PostgreSQL implementation
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 10 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN MATCHED THEN
	DELETE
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
ERROR:  multiple actions on single target row
 
ROLLBACK;

-- This next SQL statement
--  fails according to standard
--  suceeds in PostgreSQL implementation by simply ignoring the second
--  matching row since it activates no WHEN clause
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 10 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
ROLLBACK;
-- Now lets prepare the test data to generate 2 non-matching rows
DELETE FROM source WHERE id = 3 AND balance = 5;
INSERT INTO source VALUES (4, 5);
SELECT * FROM source;
 id | balance 
----+---------
  2 |       5
  3 |      20
  4 |       5
  4 |      40
(4 rows)

-- This next SQL statement
--  suceeds according to standard (yes, it is inconsistent)
--  suceeds in PostgreSQL implementation, though could easily fail if
--  there was an appropriate unique constraint
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
  4 |       5
  4 |      40
(5 rows)

ROLLBACK;
-- This next SQL statement works, but since there is no WHEN clause that
-- applies to non-matching rows, SQL standard requires us to generate
-- rows with DEFAULT VALUES for all columns, which is why we support the
-- syntax DO NOTHING (similar to the way Rules work) in addition
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED AND s.balance > 100 THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
    |
    |
(5 rows)

ROLLBACK;
-- This next SQL statement suceeds, but does nothing since there are
-- only non-matching rows that do not activate a WHEN clause, so we
-- provide syntax to just ignore them, rather than allowing data quality
-- problems
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED AND s.balance > 100 THEN
	INSERT VALUES (s.id, s.balance)
WHEN NOT MATCHED
	DO NOTHING
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
(3 rows)

ROLLBACK;
--
-- Weirdness
--
-- MERGE statement containing WHEN clauses that are never executable
-- NOT an error under the standard
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 0 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN MATCHED THEN
	DELETE
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
WHEN NOT MATCHED THEN /* never executed because of order of WHEN clauses */
	INSERT VALUES (s.id, s.balance + 10)
WHEN MATCHED THEN /* never executed because of order of WHEN clauses */
	UPDATE SET balance = t.balance + s.balance
;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement

Reply via email to