On Sat, Jan 18, 2014 at 6:17 PM, Peter Geoghegan <p...@heroku.com> wrote: > MySQL users are not notified that this happened, and are probably > blissfully unaware that there has been a limited form of data loss. So > it's The Right Thing to say to Postgres users: "if you inserted these > rows into the table when it was empty, there'd *still* definitely be a > unique constraint violation, and you need to sort that out before > asking Postgres to handle conflicts with concurrent sessions and > existing data, where rows that come from earlier commands in your xact > counts as existing data".
I Googled and found evidence indicating that a number of popular proprietary system's SQL MERGE implementations do much the same thing. You may get an "attempt to UPDATE the same row twice" error on both SQL Server and Oracle. I wouldn't like to speculate if the standard requires this of MERGE, but to require it seems very sensible. > The only problem I can see with that is that > we cannot complain consistently for practical reasons, as when we lock > *some other* xact's tuple rather than inserting in the same command > two or more times. Actually, maybe it would be practical to complain that the same UPSERT command attempted to lock a row twice with at least *almost* total accuracy, and not just for the particularly problematic case where tuple visibility is not assured. Personally, I favor just making "case HeapTupleSelfUpdated:" within the patch's ExecLockHeapTupleForUpdateSpec() function complain when "hufd.cmax == estate->es_output_cid)" (currently there is a separate complaint, but only when those two variables are unequal). That's probably almost perfect in practice. If we wanted perfection, which would be to always complain when two rows were locked by the same UPSERT command, it would be a matter of having heap_lock_tuple indicate to the patch's ExecLockHeapTupleForUpdateSpec() caller that the row was already locked, so that it could complain in a special way for the locked-not-updated case. But that is hard, because there is no way for it to know if the current *command* locked the tuple, and that's the only case that we are justified in raising an error for. But now that I think about it some more, maybe always complaining when we lock but have not yet updated is not just not worth the trouble, but is in fact bogus. It's not obvious what precise behavior is correct here. I was worried about someone updating something twice, but maybe it's fully sufficient to do what I've already proposed, while in addition documenting that you cannot on-duplicate-key-lock a tuple that has already been inserted or updated within the same command. It will be very rare for anyone to trip up over that in practice (e.g. by locking twice and spuriously updating the same row twice or more in a later command). Users learn to not try this kind of thing by having it break immediately; the fact that it doesn't break with 100% reliability is good enough (plus it doesn't *really* fail to break when it should because of how things are documented). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers