I wrote: > I think that this is a band-aid that's just masking an error in the > rowmarking logic: it's not doing the right thing for appendrels > made from UNION ALL subqueries. I've not wrapped my head around > exactly where it's going off the rails, but I feel like this ought > to get fixed somewhere else. Please hold off pushing your patch > for now.
So after studying this for awhile, I see that the planner is emitting a PlanRowMark that presumes that the UNION ALL subquery will be scanned as though it's a base relation; but since we've converted it to an appendrel, the executor just ignores that rowmark, and the wrong things happen. I think the really right fix would be to teach the executor to honor such PlanRowMarks, by getting nodeAppend.c and nodeMergeAppend.c to perform EPQ row substitution. I wrote a draft patch for that, attached, and it almost works but not quite. The trouble is that we're jamming the contents of the row identity Var created for the rowmark into the output of the Append or MergeAppend, and it doesn't necessarily match exactly. In the test case you created, the planner produces targetlists like Output: src_1.val, src_1.id, ROW(src_1.id, src_1.val) and as you can see the order of the columns doesn't match. I can see three ways we might attack that: 1. Persuade the planner to build output tlists that always match the row identity Var. This seems undesirable, because the planner might have intentionally elided columns that won't be read by the upper parts of the plan. 2. Change generation of the ROW() expression so that it lists only the values we're going to output, in the order we're going to output them. This would amount to saying that for UNION cases the "identity" of a row need only consider columns used by the plan, which feels a little odd but I can't think of a reason why it wouldn't work. I'm not sure how messy this'd be to implement though, as the set of columns to be emitted isn't fully determined until much later than where we currently expand the row identity Vars into RowExprs. 3. Fix the executor to remap what it gets out of the ROW() into the order of the subquery tlists. This is probably do-able but I'm not certain; it may be that the executor hasn't enough info. We might need to teach the planner to produce a mapping projection and attach it to the Append node, which carries some ABI risk (but in the past we've gotten away with adding new fields to the ends of plan nodes in the back branches). Another objection is that adding cycles to execution rather than planning might be a poor tradeoff --- although if we only do the work when EPQ is invoked, maybe it'd be the best way. It might be that any of these things is too messy to be considered for back-patching, and we ought to apply what you did in the back branches. I'd like a better fix in HEAD though. regards, tom lane
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c index c7059e7528..113217e607 100644 --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -288,8 +288,65 @@ static TupleTableSlot * ExecAppend(PlanState *pstate) { AppendState *node = castNode(AppendState, pstate); + EState *estate = node->ps.state; TupleTableSlot *result; + if (estate->es_epq_active != NULL) + { + /* + * We are inside an EvalPlanQual recheck. If there is a relevant + * rowmark for the append relation, return the test tuple if one is + * available. + */ + EPQState *epqstate = estate->es_epq_active; + int scanrelid; + + if (bms_get_singleton_member(castNode(Append, node->ps.plan)->apprelids, + &scanrelid)) + { + if (epqstate->relsubs_done[scanrelid - 1]) + { + /* + * Return empty slot, as either there is no EPQ tuple for this + * rel or we already returned it. + */ + TupleTableSlot *slot = node->ps.ps_ResultTupleSlot; + + return ExecClearTuple(slot); + } + else if (epqstate->relsubs_slot[scanrelid - 1] != NULL) + { + /* + * Return replacement tuple provided by the EPQ caller. + */ + TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1]; + + Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL); + + /* Mark to remember that we shouldn't return it again */ + epqstate->relsubs_done[scanrelid - 1] = true; + + return slot; + } + else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL) + { + /* + * Fetch and return replacement tuple using a non-locking + * rowmark. + */ + TupleTableSlot *slot = node->ps.ps_ResultTupleSlot; + + /* Mark to remember that we shouldn't return more */ + epqstate->relsubs_done[scanrelid - 1] = true; + + if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot)) + return NULL; + + return slot; + } + } + } + /* * If this is the first call after Init or ReScan, we need to do the * initialization work. @@ -405,6 +462,7 @@ ExecEndAppend(AppendState *node) void ExecReScanAppend(AppendState *node) { + EState *estate = node->ps.state; int nasyncplans = node->as_nasyncplans; int i; @@ -443,6 +501,23 @@ ExecReScanAppend(AppendState *node) ExecReScan(subnode); } + /* + * Rescan EvalPlanQual tuple(s) if we're inside an EvalPlanQual recheck. + * But don't lose the "blocked" status of blocked target relations. + */ + if (estate->es_epq_active != NULL) + { + EPQState *epqstate = estate->es_epq_active; + int scanrelid; + + if (bms_get_singleton_member(castNode(Append, node->ps.plan)->apprelids, + &scanrelid)) + { + epqstate->relsubs_done[scanrelid - 1] = + epqstate->relsubs_blocked[scanrelid - 1]; + } + } + /* Reset async state */ if (nasyncplans > 0) { diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c index 0817868452..f2c386b123 100644 --- a/src/backend/executor/nodeMergeAppend.c +++ b/src/backend/executor/nodeMergeAppend.c @@ -200,11 +200,68 @@ static TupleTableSlot * ExecMergeAppend(PlanState *pstate) { MergeAppendState *node = castNode(MergeAppendState, pstate); + EState *estate = node->ps.state; TupleTableSlot *result; SlotNumber i; CHECK_FOR_INTERRUPTS(); + if (estate->es_epq_active != NULL) + { + /* + * We are inside an EvalPlanQual recheck. If there is a relevant + * rowmark for the append relation, return the test tuple if one is + * available. + */ + EPQState *epqstate = estate->es_epq_active; + int scanrelid; + + if (bms_get_singleton_member(castNode(MergeAppend, node->ps.plan)->apprelids, + &scanrelid)) + { + if (epqstate->relsubs_done[scanrelid - 1]) + { + /* + * Return empty slot, as either there is no EPQ tuple for this + * rel or we already returned it. + */ + TupleTableSlot *slot = node->ps.ps_ResultTupleSlot; + + return ExecClearTuple(slot); + } + else if (epqstate->relsubs_slot[scanrelid - 1] != NULL) + { + /* + * Return replacement tuple provided by the EPQ caller. + */ + TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1]; + + Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL); + + /* Mark to remember that we shouldn't return it again */ + epqstate->relsubs_done[scanrelid - 1] = true; + + return slot; + } + else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL) + { + /* + * Fetch and return replacement tuple using a non-locking + * rowmark. + */ + TupleTableSlot *slot = node->ps.ps_ResultTupleSlot; + + /* Mark to remember that we shouldn't return more */ + epqstate->relsubs_done[scanrelid - 1] = true; + + if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot)) + return NULL; + + return slot; + } + } + } + if (!node->ms_initialized) { /* Nothing to do if all subplans were pruned */ @@ -339,6 +396,7 @@ ExecEndMergeAppend(MergeAppendState *node) void ExecReScanMergeAppend(MergeAppendState *node) { + EState *estate = node->ps.state; int i; /* @@ -372,6 +430,24 @@ ExecReScanMergeAppend(MergeAppendState *node) if (subnode->chgParam == NULL) ExecReScan(subnode); } + + /* + * Rescan EvalPlanQual tuple(s) if we're inside an EvalPlanQual recheck. + * But don't lose the "blocked" status of blocked target relations. + */ + if (estate->es_epq_active != NULL) + { + EPQState *epqstate = estate->es_epq_active; + int scanrelid; + + if (bms_get_singleton_member(castNode(MergeAppend, node->ps.plan)->apprelids, + &scanrelid)) + { + epqstate->relsubs_done[scanrelid - 1] = + epqstate->relsubs_blocked[scanrelid - 1]; + } + } + binaryheap_reset(node->ms_heap); node->ms_initialized = false; }