Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

Tom Lane Thu, 22 Feb 2024 16:12:54 -0800

I wrote:
> I think that this is a band-aid that's just masking an error in the
> rowmarking logic: it's not doing the right thing for appendrels
> made from UNION ALL subqueries.  I've not wrapped my head around
> exactly where it's going off the rails, but I feel like this ought
> to get fixed somewhere else.  Please hold off pushing your patch
> for now.


So after studying this for awhile, I see that the planner is emitting
a PlanRowMark that presumes that the UNION ALL subquery will be
scanned as though it's a base relation; but since we've converted it
to an appendrel, the executor just ignores that rowmark, and the wrong
things happen.  I think the really right fix would be to teach the
executor to honor such PlanRowMarks, by getting nodeAppend.c and
nodeMergeAppend.c to perform EPQ row substitution.  I wrote a draft
patch for that, attached, and it almost works but not quite.  The
trouble is that we're jamming the contents of the row identity Var
created for the rowmark into the output of the Append or MergeAppend,
and it doesn't necessarily match exactly.  In the test case you
created, the planner produces targetlists like

            Output: src_1.val, src_1.id, ROW(src_1.id, src_1.val)

and as you can see the order of the columns doesn't match.
I can see three ways we might attack that:

1. Persuade the planner to build output tlists that always match
the row identity Var.  This seems undesirable, because the planner
might have intentionally elided columns that won't be read by the
upper parts of the plan.

2. Change generation of the ROW() expression so that it lists only
the values we're going to output, in the order we're going to
output them.  This would amount to saying that for UNION cases
the "identity" of a row need only consider columns used by the
plan, which feels a little odd but I can't think of a reason why
it wouldn't work.  I'm not sure how messy this'd be to implement
though, as the set of columns to be emitted isn't fully determined
until much later than where we currently expand the row identity
Vars into RowExprs.

3. Fix the executor to remap what it gets out of the ROW() into the
order of the subquery tlists.  This is probably do-able but I'm
not certain; it may be that the executor hasn't enough info.
We might need to teach the planner to produce a mapping projection
and attach it to the Append node, which carries some ABI risk (but
in the past we've gotten away with adding new fields to the ends
of plan nodes in the back branches).  Another objection is that
adding cycles to execution rather than planning might be a poor
tradeoff --- although if we only do the work when EPQ is invoked,
maybe it'd be the best way.

It might be that any of these things is too messy to be considered
for back-patching, and we ought to apply what you did in the
back branches.  I'd like a better fix in HEAD though.

                        regards, tom lane

diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c7059e7528..113217e607 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -288,8 +288,65 @@ static TupleTableSlot *
 ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
+	EState	   *estate = node->ps.state;
 	TupleTableSlot *result;
 
+	if (estate->es_epq_active != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If there is a relevant
+		 * rowmark for the append relation, return the test tuple if one is
+		 * available.
+		 */
+		EPQState   *epqstate = estate->es_epq_active;
+		int			scanrelid;
+
+		if (bms_get_singleton_member(castNode(Append, node->ps.plan)->apprelids,
+									 &scanrelid))
+		{
+			if (epqstate->relsubs_done[scanrelid - 1])
+			{
+				/*
+				 * Return empty slot, as either there is no EPQ tuple for this
+				 * rel or we already returned it.
+				 */
+				TupleTableSlot *slot = node->ps.ps_ResultTupleSlot;
+
+				return ExecClearTuple(slot);
+			}
+			else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+			{
+				/*
+				 * Return replacement tuple provided by the EPQ caller.
+				 */
+				TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+				Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+				/* Mark to remember that we shouldn't return it again */
+				epqstate->relsubs_done[scanrelid - 1] = true;
+
+				return slot;
+			}
+			else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+			{
+				/*
+				 * Fetch and return replacement tuple using a non-locking
+				 * rowmark.
+				 */
+				TupleTableSlot *slot = node->ps.ps_ResultTupleSlot;
+
+				/* Mark to remember that we shouldn't return more */
+				epqstate->relsubs_done[scanrelid - 1] = true;
+
+				if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
+					return NULL;
+
+				return slot;
+			}
+		}
+	}
+
 	/*
 	 * If this is the first call after Init or ReScan, we need to do the
 	 * initialization work.
@@ -405,6 +462,7 @@ ExecEndAppend(AppendState *node)
 void
 ExecReScanAppend(AppendState *node)
 {
+	EState	   *estate = node->ps.state;
 	int			nasyncplans = node->as_nasyncplans;
 	int			i;
 
@@ -443,6 +501,23 @@ ExecReScanAppend(AppendState *node)
 			ExecReScan(subnode);
 	}
 
+	/*
+	 * Rescan EvalPlanQual tuple(s) if we're inside an EvalPlanQual recheck.
+	 * But don't lose the "blocked" status of blocked target relations.
+	 */
+	if (estate->es_epq_active != NULL)
+	{
+		EPQState   *epqstate = estate->es_epq_active;
+		int			scanrelid;
+
+		if (bms_get_singleton_member(castNode(Append, node->ps.plan)->apprelids,
+									 &scanrelid))
+		{
+			epqstate->relsubs_done[scanrelid - 1] =
+				epqstate->relsubs_blocked[scanrelid - 1];
+		}
+	}
+
 	/* Reset async state */
 	if (nasyncplans > 0)
 	{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 0817868452..f2c386b123 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -200,11 +200,68 @@ static TupleTableSlot *
 ExecMergeAppend(PlanState *pstate)
 {
 	MergeAppendState *node = castNode(MergeAppendState, pstate);
+	EState	   *estate = node->ps.state;
 	TupleTableSlot *result;
 	SlotNumber	i;
 
 	CHECK_FOR_INTERRUPTS();
 
+	if (estate->es_epq_active != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If there is a relevant
+		 * rowmark for the append relation, return the test tuple if one is
+		 * available.
+		 */
+		EPQState   *epqstate = estate->es_epq_active;
+		int			scanrelid;
+
+		if (bms_get_singleton_member(castNode(MergeAppend, node->ps.plan)->apprelids,
+									 &scanrelid))
+		{
+			if (epqstate->relsubs_done[scanrelid - 1])
+			{
+				/*
+				 * Return empty slot, as either there is no EPQ tuple for this
+				 * rel or we already returned it.
+				 */
+				TupleTableSlot *slot = node->ps.ps_ResultTupleSlot;
+
+				return ExecClearTuple(slot);
+			}
+			else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+			{
+				/*
+				 * Return replacement tuple provided by the EPQ caller.
+				 */
+				TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+				Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+				/* Mark to remember that we shouldn't return it again */
+				epqstate->relsubs_done[scanrelid - 1] = true;
+
+				return slot;
+			}
+			else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+			{
+				/*
+				 * Fetch and return replacement tuple using a non-locking
+				 * rowmark.
+				 */
+				TupleTableSlot *slot = node->ps.ps_ResultTupleSlot;
+
+				/* Mark to remember that we shouldn't return more */
+				epqstate->relsubs_done[scanrelid - 1] = true;
+
+				if (!EvalPlanQualFetchRowMark(epqstate, scanrelid, slot))
+					return NULL;
+
+				return slot;
+			}
+		}
+	}
+
 	if (!node->ms_initialized)
 	{
 		/* Nothing to do if all subplans were pruned */
@@ -339,6 +396,7 @@ ExecEndMergeAppend(MergeAppendState *node)
 void
 ExecReScanMergeAppend(MergeAppendState *node)
 {
+	EState	   *estate = node->ps.state;
 	int			i;
 
 	/*
@@ -372,6 +430,24 @@ ExecReScanMergeAppend(MergeAppendState *node)
 		if (subnode->chgParam == NULL)
 			ExecReScan(subnode);
 	}
+
+	/*
+	 * Rescan EvalPlanQual tuple(s) if we're inside an EvalPlanQual recheck.
+	 * But don't lose the "blocked" status of blocked target relations.
+	 */
+	if (estate->es_epq_active != NULL)
+	{
+		EPQState   *epqstate = estate->es_epq_active;
+		int			scanrelid;
+
+		if (bms_get_singleton_member(castNode(MergeAppend, node->ps.plan)->apprelids,
+									 &scanrelid))
+		{
+			epqstate->relsubs_done[scanrelid - 1] =
+				epqstate->relsubs_blocked[scanrelid - 1];
+		}
+	}
+
 	binaryheap_reset(node->ms_heap);
 	node->ms_initialized = false;
 }

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

Reply via email to