David Rowley <david.row...@2ndquadrant.com> writes: > On Thu, 8 Nov 2018 at 13:14, Tom Lane <t...@sss.pgh.pa.us> wrote: >> There are several things we might do to fix this: >> >> 1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit >> in ExecInitModifyTable. We might be forced into that someday anyway if >> we want to support non-heap-style tables, since most other peoples' >> versions of indexes do want to know about deletions. >> >> 2. Drop the target-table optimization from nodeIndexscan.c and friends, >> and just always open the scan index with AccessShareLock. (BTW, it's >> a bit inconsistent that these nodes don't release their index locks >> at the end; ExecCloseIndices does.) >> >> 3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE >> exception, so that they get the lock for themselves in that case. This >> seems pretty ugly/fragile, but it's about the only option that won't end >> in adding index lock-acquisition overhead in some cases. (But given the >> planner's behavior, it's not clear to me how often that really matters.)
> Since I missed this and only discovered this was a problem when > someone else reported it today, and since I already did my own > analysis separately in [1], then my vote is for #2. Thinking more about this, the problem I noted previously about two of these solutions not working if the index scan node is not physically underneath the ModifyTable node actually applies to all three :-(. It's a slightly different issue for #2, namely that what we risk is first taking AccessShareLock and then upgrading to RowExclusiveLock. Since there are places (not many) that take ShareLock on indexes, this would pose a deadlock risk. Now, after more thought, I believe that it's probably impossible to have a plan tree in which ExecRelationIsTargetRelation would return true at an indexscan node that's not underneath the ModifyTable node. What *is* possible is that we have such an indexscan on a different RTE for the same table. I constructed this admittedly artificial example in the regression database: # explain with x1 as (select * from tenk1 t1 where unique1 = 42), x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *) select * from x1,x2 where x1.ten = x2.ten; QUERY PLAN ---------------------------------------------------------------------------------------------- Nested Loop (cost=16.61..16.66 rows=1 width=488) Join Filter: (x1.ten = x2.ten) CTE x1 -> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.29..8.30 rows=1 width=244) Index Cond: (unique1 = 42) CTE x2 -> Update on tenk1 t2 (cost=0.29..8.30 rows=1 width=250) -> Index Scan using tenk1_unique2 on tenk1 t2 (cost=0.29..8.30 rows=1 width=250) Index Cond: (unique2 = 11) -> CTE Scan on x1 (cost=0.00..0.02 rows=1 width=244) -> CTE Scan on x2 (cost=0.00..0.02 rows=1 width=244) (11 rows) Because the CTEs will be initialized in order, this presents a case where the lock-upgrade hazard exists today: the x1 indexscan will open tenk1_unique1 with AccessShareLock and then x2's ModifyTable node will upgrade that to RowExclusiveLock. None of the proposed fixes improve this. I'm beginning to think that postponing target-index locking to ExecInitModifyTable was a damfool idea and we should undo it. > For partitioned > table updates, there may be many result relations which can cause > ExecRelationIsTargetRelation() to become very slow, in such a case any > additional redundant lock would be cheap by comparison. Yeah, it'd be nice to get rid of the need for that. regards, tom lane