On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsst...@mit.edu> wrote: > On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmh...@gmail.com> wrote: >> Now, as Greg says, that might be what some people want, but it's >> certainly monumentally unserializable. > > To be clear when I said it's what people want what I meant was that in > the common cases it's doing exactly what people want. As opposed to > getting closer to what people want in general but not quite hitting > the mark in the common cases. > > Just as an example I think it's important that in the simplest case, > upsert of a single record, it be 100% guaranteed to do the naive > upsert. If two users are doing the merge of a single key at the same > time one of them had better insert and one of them had better update > or else users are going to be monumentally surprised.
Hmm, so let's think about that case. The first merge comes along and finds no match so it fires the NOT MATCHED rule, which inserts a tuple. The second merge comes along and finds no match, so it also fires the NOT MATCHED rule and tries to insert a tuple. But upon consulting the PRIMARY KEY index it finds that an in-doubt tuple exists so it goes to sleep waiting for the first transaction to commit or abort. If the first transaction commits it then decides that the jig is up and fails. We could (maybe) fix this by doing something similar to what EPQ does for updates: when the first transaction commits, instead of redoing the insert, we back up and recheck whether the new tuple would have matched the join clause and, if so, we instead fire the MATCHED action on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not sure how hard that would be, or whether it would introduce any other nasty anomalies in more complex cases. Alternatively, we could introduce an UPSERT or REPLACE statement intended to handle exactly this case and leave MERGE for more complex situations. It's pretty easy to imagine what the coding of that should look like: if we encounter an in-doubt tuple in we wait on its xmin. If the transaction aborts, we insert. If it commits, and we're in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ or SERIALIZABLE mode, we abort with a serialization error. That's a lot simpler to understand and reason about than MERGE in its full generality. I think it's pretty much hopeless to think that MERGE is going to work in complex concurrent scenarios without creating serialization anomalies, or at least rollbacks. I think that's baked into the nature of what the statement does. To simulate MERGE, you need to read from the target table and then do writes that depend on what you read. If you do that with the commands that are available today, you're going to get serialization anomalies and/or rollbacks under concurrency. The mere fact of that logic being inside the database rather than outside isn't going to make that go away. Now sometimes, as with exclusion constraints, you can play games with dirty snapshots to get the semantics you want, but whether that's possible in a particular case depends on the details of the operation being performed, and here I think it can't be done. Some operations are *fundamentally* unserializable. A very simple example of this is a sequence that is guaranteed not to have gaps (a feature we've occasionally been requested to provide). If N processes request a sequence number simultaneously, you have to hand out a value to the first guy and wait and see whether he commits or aborts before deciding what number to give the second guy. That sucks, so usually we just design our applications not to require that sequences be gap-free. Similarly here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers