On Thu, Jan 16, 2014 at 6:31 PM, Peter Geoghegan <p...@heroku.com> wrote: > I think we need to give this some more thought. I have not addressed > the implications for MVCC snapshots here.
So I gave this some more thought, and this is what I came up with: + static bool + ExecLockHeapTupleForUpdateSpec(EState *estate, + ResultRelInfo *relinfo, + ItemPointer tid) + { + Relation relation = relinfo->ri_RelationDesc; + HeapTupleData tuple; + HeapUpdateFailureData hufd; + HTSU_Result test; + Buffer buffer; + + Assert(ItemPointerIsValid(tid)); + + /* Lock tuple for update */ + tuple.t_self = *tid; + test = heap_lock_tuple(relation, &tuple, + estate->es_output_cid, + LockTupleExclusive, false, /* wait */ + true, &buffer, &hufd); + ReleaseBuffer(buffer); + + switch (test) + { + case HeapTupleInvisible: + /* + * Tuple may have originated from this transaction, in which case + * it's already locked. However, to avoid having to consider the + * case where the user locked an instantaneously invisible row + * inserted in the same command, throw an error. + */ + if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data))) + ereport(ERROR, + (errcode(ERRCODE_UNIQUE_VIOLATION), + errmsg("could not lock instantaneously invisible tuple inserted in same transaction"), + errhint("Ensure that no rows proposed for insertion in the same command have constrained values that duplicate each other."))); + if (IsolationUsesXactSnapshot()) + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("could not serialize access due to concurrent update"))); + /* Tuple became invisible due to concurrent update; try again */ + return false; + case HeapTupleSelfUpdated: + /* I'm just throwing an error when locking the tuple returns HeapTupleInvisible, and the xmin of the tuple is our xid. It's sufficient to just check TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)), because there is no way that _bt_check_unique() could consider the tuple dirty visible + conclusively fit for a lock attempt if it came from our xact, while at the same time for the same tuple HeapTupleSatisfiesUpdate() indicated invisibility, unless the tuple originated from the same command. Checking against subxacts or ancestor xacts is at worst redundant. I am happy with this. ISTM that it'd be hard to argue that any reasonable and well-informed person would ever thank us for trying harder here, although it took me a while to reach that position. To understand what I mean, consider what MySQL does when in a similar position. I didn't actually check, but based on the fact that their docs don't consider this question I guess MySQL would go update the tuple inserted by that same "INSERT....ON DUPLICATE KEY UPDATE" command. Most of the time the conflicting tuples proposed for insertion by the user are in *some* way different (i.e. if the table was initially empty and you did a regular insert, inserting those same tuples would cause a unique constraint violation all on their own, but without there being any fully identical tuples among these hypothetical tuples proposed for insertion). It seems obvious that the order in which each tuple is evaluated for insert-or-update on MySQL is more or less undefined. And so by allowing this, they arguably allow their users to miss something they should not: they don't end up doing anything useful with the datums originally inserted in the command, but then subsequently updated over with something else in the same command. MySQL users are not notified that this happened, and are probably blissfully unaware that there has been a limited form of data loss. So it's The Right Thing to say to Postgres users: "if you inserted these rows into the table when it was empty, there'd *still* definitely be a unique constraint violation, and you need to sort that out before asking Postgres to handle conflicts with concurrent sessions and existing data, where rows that come from earlier commands in your xact counts as existing data". The only problem I can see with that is that we cannot complain consistently for practical reasons, as when we lock *some other* xact's tuple rather than inserting in the same command two or more times. But at least when that happens they can definitely update two or more times (i.e. the row that we "locked twice" is visible). Naturally we can't catch every error a DML author may make. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers