Nathan Bossart <nathandboss...@gmail.com> writes: > On Thu, May 18, 2023 at 11:22:54AM -0400, Tom Lane wrote: >> Ugh. Bisecting says it broke at >> commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35 >> which was absolutely not supposed to be breaking any concurrent-execution >> guarantees. I wonder what we got wrong.
> With the reproduction steps listed upthread, I see that XMAX for both > tuples is set to the deleting transaction, but the one in inh_child_2 has > two additional infomask flags: HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_LOCK_ONLY. > If I add a third table (i.e., inh_child_3), XMAX for all three tuples is > set to the deleting transaction, and only the one in inh_child_3 has the > lock bits set. Also, in the three-table case, the DELETE statement reports > "DELETE 2". Yeah. I see the problem: when starting up an EPQ recheck, we stuff the tuple-to-test into the epqstate->relsubs_slot[] entry for the relation it came from, but we do nothing to the EPQ state for the other target relations, which allows the EPQ plan to fetch rows from those relations as usual. If it finds a (non-updated) row passing the qual, kaboom! We decide the EPQ check passed. What we need to do, I think, is set epqstate->relsubs_done[] for all target relations except the one we are stuffing a tuple into. While nodeModifyTable can certainly be made to do that, things are complicated by the fact that currently ExecScanReScan thinks it ought to clear all the relsubs_done flags, which would break things again. I wonder if we can simply delete that code. Dropping the FDW/Custom-specific code there is a bit scary, but on the whole that looks like code that got cargo-culted in rather than anything we actually need. The reason this wasn't a bug before 86dc90056 is that any given plan tree could have only one target relation, so there was not anything else to suppress. regards, tom lane