Re: [HACKERS] MERGE vs REPLACE

Bruce Momjian Sun, 20 Nov 2005 09:31:39 -0800

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Oh, good point.  I was thinking just about concurrent MERGEs.  However,
> > it is more complicated than that.  By definitaion you can not see
> > changes from other transactions while your statement is being run (even
> > if you increment CommandCounter), so to be atomic, you would still see
> > the row even though some other transaction had deleted it.
> 
> We would have to use the same semantics we use now for read-committed
> UPDATE, that is look at the latest version of the row even though this
> would not normally be visible to the transaction's snapshot.
> 
> In the case of a serializable transaction, no doubt we should fail if
> any concurrent change actually happens.


I have some psaudocode to explain what we want for this feature,
whatever syntax we choose:

        Start
        Check unique index
        Found
                lock row for update
                if zero rows, return to start
                if more than one row, fail
                update row
        Notfound
                create savepoint
                insert row into heap
                lock index page
                if conflicting index entry, abort savepoint, return to start
                add index entry
                unlock index page

While the "notfound" case might look strange, we actually use this exact
method for inserts now, see ExecInsert() and _bt_doinsert(). 
Particularly see this comment in the second function:

    /*
     * If we're not allowing duplicates, make sure the key isn't already in
     * the index.
     *
     * NOTE: obviously,_bt_check_unique can only detect keys that are already in
     * the index; so it cannot defend against concurrent insertions of the
     * same key.  We protect against that by means of holding a write lock on
     * the target page.  Any other would-be inserter of the same key must
     * acquire a write lock on the same target page, so only one would-be
     * inserter can be making the check at one time.  Furthermore, once we are
     * past the check we hold write locks continuously until we have performed
     * our insertion, so no later inserter can fail to see our insertion.
     * (This requires some care in _bt_insertonpg.)
     *
     * If we must wait for another xact, we release the lock while waiting, and
     * then must start over completely.
     */

Here is the unique check error from _bt_check_unique():

                    ereport(ERROR,
                            (errcode(ERRCODE_UNIQUE_VIOLATION),
                    errmsg("duplicate key violates unique constraint \"%s\"",
                           RelationGetRelationName(rel))));

I think the problem here is that it is going to longjump() back to
postgres.c (and out of your code loop).  While we have savepoints, I
think they only work coming from client applications, rather than inside
our code.  Ideally you would like to be able to say:

        savepoint();
        func();
        rollback_to_savepoint();

but you can't, so I think you are going to have to factor out that
unique error callback and return a failure code to the caller.  I
suppose some boolean flag need to be added to _bt_doinsert(), but that
is called via a function pointer for the index type, so you are going to
have to update the insert function signatures for all access methods. 
The good news is that only btree supports unique indexes, according to
the documentation ("Only B-tree currently supports unique indexes") so
for the other access methods the extra parameter is just ignored.

Another issue is multiple unique indexes.  What if the first unique
index matches one row, but a different row matches the second unique
indexed column?  Fail because unique checks do not identify exactly one
row?

Or the _new_ value for the second indexed column conflicts with the
second unique index. The MERGE/REPLACE should fail.  The UPDATE block
will handle this on its own, but the INSERT block will need to check for
that an really error out, rather than return to the caller, so the loop
in ExecInsertIndexTuples() has to restart on unique failure _only_ on
the first index check, not the subsequent ones.

One simplification would be to allow MERGE/REPLACE only on a table that
has a single unique index.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] MERGE vs REPLACE

Reply via email to