I'm working on UPSERT again. I think that in order to make useful progress, I'll have to find a better way of overcoming the visibility issues (i.e. the problem of what to do about a still-in-progress-to-our-snapshot row being locked at READ COMMITTED isolation level [1][2]).
I've made some tentative additions to my earlier patch to change the syntax: postgres=# explain analyze with c as ( insert into ints(key, val) values (1, 'a'), (2, 'b') on conflict select * for update) update ints set val = c.val from c conflicts; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Update on ints (cost=0.03..62.34 rows=2 width=104) (actual time=0.095..0.095 rows=0 loops=1) CTE c -> Insert on ints ints_1 (cost=0.00..0.03 rows=2 width=36) (actual time=0.039..0.049 rows=2 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36) (actual time=0.001..0.002 rows=2 loops=1) -> Nested Loop (cost=0.00..62.31 rows=2 width=104) (actual time=0.063..0.080 rows=2 loops=1) Join Filter: (ints.ctid = c.ctid) Rows Removed by Join Filter: 6 -> CTE Scan on c (cost=0.00..0.04 rows=2 width=100) (actual time=0.044..0.055 rows=2 loops=1) -> Materialize (cost=0.00..28.45 rows=1230 width=10) (actual time=0.005..0.006 rows=4 loops=2) -> Seq Scan on ints (cost=0.00..22.30 rows=1230 width=10) (actual time=0.009..0.009 rows=4 loops=1) Planning time: 0.132 ms Execution time: 0.159 ms (12 rows) This works, as far as it goes. Parse analysis adds the ctid join to the top-level UPDATE based on the fact that a magical UPDATE statement (magical due to having a CONFLICTS clause) references a CTE in its FromList (this is mandated by CONFLICTS during parse analysis), a CTE containing an INSERT ON CONFLICT, and itself projecting a magical set of rejected rows sufficient to get a locked c.ctid implicitly. (FWIW, DELETEs may also accept a CONFLICTS clause and be used like this). The optimizer does not support joins involving ctid, which is why there is a seqscan plan - even setting "enable_seqscan = off" does not alter the plan right now. Apart from the obvious fact that we're doing an unnecessary Seq Scan on the target table, which is unacceptable on performance grounds (since we do in fact already know exactly what list of tids to pass to the top-level ModifyTable node to UPDATE), there is another problem: Since I cannot use a tid scan, I cannot play tricks with visibility within tidscans for the benefit of solving the basic problem of doing the right thing for READ COMMITTED. Clearly I need to add a new "MVCC violation" in the spirit of EvalPlanQual() to avoid this problem (since READ COMMITTED serialization failures are unacceptable to users), but I'd like this new MVCC violation to be as selective as possible. Let me take a step back, though. Presently UPDATE WHERE CURRENT OF does something that isn't a million miles from what I have in mind to address the problem. That feature exists for "sensitive cursors", where we lock rows (since FOR UPDATE was specified with the DECLARE CURSOR statement) as they're fetched from the cursor. The upshot is that in general, rows can change "out from under us" since they're not yet locked, including in ways that might result in a new row version not satisfying the original SELECT FOR UPDATE query predicate - clearly users don't want to have to deal with that when they subsequently go to UPDATE WHERE CURRENT OF. Special handling is required. Purely because UPDATE WHERE CURRENT OF was specified, as part of this special handling that update must use a tid scan. There is a little bit of special machinery within the optimizer (within cost_tidscan()) to force this for the isCurrentOf case. The tid scan code within the executor also has special UPDATE WHERE CURRENT OF handling. TidListCreate() specially handles the case by asking execCurrentOf() to look up the cursor's portal, and to get a tid from its QueryDesc for the current cursor position. TidNext() specially fetches the most recent version visible to our estate's Snapshot when we UPDATE WHERE CURRENT OF, without considering the original SELECT FOR UPDATE predicate (and whether or not it would in any sense continue to be satisfied). Each tid is then fetched from the heap and returned. I think I ought to be able to do something similar with this new CONFLICTS clause, by similarly marshaling some list of tids ultimately originating from the ModifyTable node that locked a potentially-not-yet-visible row (or maybe the optimizer recognizes the internal special case and creates the tid scan node "visibility credulous"). I'd then return heap tuples by using a dirty snapshot. Since this can only happen for the UPDATE's tid scan, and there must be a tid scan, it should be correct (on its own fairly reasonable terms, for READ COMMITTED). After all, at this juncture for the purposes of UPSERT it will already certainly be the case that the tuple is locked. I would like to: * Get feedback on the hazards of taking this direction, and indeed the advisability of the whole approach in general. For example, I think we'd always want to use a DirtySnapshot, and never try to just get the latest version visible to our esate snapshot, but I might be wrong about that. Since my magical UPDATE could have additional boolean conditions in its predicate (that is, additions to the implicit ctid join, in contrast to the existing magical WHERE CURRENT OF variant of UPDATE where that's always the only thing in the predicate), the "morality" of *which* tid should get to the UPDATE should certainly be considered. Of particular concern is the fact that having Postgres do one or the other could make the difference between an UPDATE qual being satisfied or not being satisfied. Maybe this is an argument against being able to tack on additional boolean conditions in the following fashion (which I'd prefer to support without also supporting more that one join (the magical implicit one), even though as I mentioned WHERE CURRENT OF does not support boolean conditions): with c as ( insert into ints(key, val) values (1, 'a'), (2, 'b') on conflict select * for update) update ints set val = c.val from c conflicts where c.key != 1; * Get some direction on what it would take to make joins involving ctid work, since all of this hinges upon that infrastructure becoming available. This appears to be a Simple Matter of Programming (at least for someone that happens to already have a good understanding of the optimizer), and is anticipated by this comment within tidpath.c: * There is currently no special support for joins involving CTID; in * particular nothing corresponding to best_inner_indexscan(). Since it's * not very useful to store TIDs of one table in another table, there * doesn't seem to be enough use-case to justify adding a lot of code * for that. [1] http://www.postgresql.org/message-id/AANLkTineR-rDFWENeddLg=grkt+epmhk2j9x0yqpi...@mail.gmail.com [2] http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf , slide 25 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers