Re: Incremental sorts and EXEC_FLAG_REWIND

James Coleman Fri, 24 Apr 2020 13:35:44 -0700

On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc...@gmail.com> wrote:
>
> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc...@gmail.com> wrote:
> >
> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc...@gmail.com> wrote:
> > >
> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <mich...@paquier.xyz> 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > When initializing an incremental sort node, we have the following as
> > > > of ExecInitIncrementalSort():
> > > >     /*
> > > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of 
> > > > many sort
> > > >      * batches in the current sort state.
> > > >      */
> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
> > > >                        EXEC_FLAG_MARK)) == 0);
> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
> > > > to begin with (because incremental sorts don't support rescans without
> > > > parameter changes, right?), the comment and the assertion are telling
> > > > a different story.
> > >
> > > I remember changing this assertion in response to an issue I'd found
> > > which led to rewriting the rescan implementation, but I must have
> > > missed updating the comment.
> >
> > All right, here are the most relevant messages:
> >
> > [1]: Here I'd said:
> > ----------
> > While working on finding a test case to show rescan isn't implemented
> > properly yet, I came across a bug. At the top of
> > ExecInitIncrementalSort, we assert that eflags does not contain
> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
> > disabled) breaks that assertion:
> >
> > select * from t join (select * from t order by a, b) s on s.a = t.a
> > where t.a in (1,2);
> >
> > The comments about this flag in src/include/executor/executor.h say:
> >
> > * REWIND indicates that the plan node should try to efficiently support
> > * rescans without parameter changes. (Nodes must support ExecReScan calls
> > * in any case, but if this flag was not given, they are at liberty to do it
> > * through complete recalculation. Note that a parameter change forces a
> > * full recalculation in any case.)
> >
> > Now we know that except in rare cases (as just discussed recently up
> > thread) we can't implement rescan efficiently.
> >
> > So is this a planner bug (i.e., should we try not to generate
> > incremental sort plans that require efficient rewind)? Or can we just
> > remove that part of the assertion and know that we'll implement the
> > rescan, albeit inefficiently? We already explicitly declare that we
> > don't support backwards scanning, but I don't see a way to declare the
> > same for rewind.
> > ----------
> >
> > So it seems to me that we can't disallow REWIND, and we have to
> > support rescan, but, we could try to mitigate the effects (without a
> > param change) with a materialize node, as noted below.
> >
> > [2]: Here, in response to my questioning above if this was a planner
> > bug, I'd said:
> > ----------
> > Other nodes seem to get a materialization node placed above them to
> > support this case "better". Is that something we should be doing?
> > ----------
> >
> > I never got any reply on this point; if we _did_ introduce a
> > materialize node here, then it would mean we could start disallowing
> > REWIND again. See the email for full details of a specific plan that I
> > encountered that reproduced this.
> >
> > Thoughts?
> >
> > > In the meantime, your question is primarily about making sure the
> > > code/comments/etc. are consistent and not a behavioral problem or
> > > failure you've seen in testing?
> >
> > Still want to confirm this is the case.
> >
> > James
> >
> > [1]: 
> > https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
> > [2]: 
> > https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>
> Looking at this more, I think this is definitely suspect. The current
> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
> the former is definitely fine because we declare that we don't support
> backwards scans. The latter seems like the same reasoning would apply,
> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
> attached a patch to do that.
>
> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
> the comments/docs seem to suggest it's a hint for efficiency rather
> than something that has to work or be declared as not implemented, so
> it seems like one of the following should be the outcome:
>
> 1. "Disallow" it by only generating materialize nodes above the
> incremental sort node if REWIND will be required. I'm not sure if this
> would mean that incremental sort just wouldn't be useful in that case?
> 2. Keep the existing implementation where we basically ignore REWIND
> and use our more inefficient implementation. In this case, I believe
> we need to stop shielding child nodes from REWIND, though, since we we
> aren't actually storing the full result set and will instead be
> re-executing the child nodes.
>
> I've attached a patch to take course (2), since it's the easiest to
> implement. But I'd still like feedback on what we should do here,
> because I don't feel like I actually know what the semantics expected
> of the executor/planner are on this point. If we do go with this
> approach, someone should verify my comments additions about
> materialize nodes is correct.


I also happened to noticed that in rescan we are always setting
node->bounded = false. I was under the impression that
ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
that the inverse is true. Therefore if we set this to false each time,
then we lose any possibility of using the bounded optimization for all
rescans.

I've added a tiny patch (minus one line) to the earlier patch series
to fix that.


James

From f5ec159206886f7e1fbf91e4371aa6d745ff98ec Mon Sep 17 00:00:00 2001
From: James Coleman <jtc...@gmail.com>
Date: Fri, 24 Apr 2020 16:28:16 -0400
Subject: [PATCH v2 3/3] Don't reset node->bounded

The `bounded` property is set ExecSetTupleBound before rescan is called,
so we shouldn't reset this to false every time we rescan, else we just
lose the bounded optimization on for all rescans.
---
 src/backend/executor/nodeIncrementalSort.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 2d2095964f..144dd39dd0 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1149,7 +1149,6 @@ ExecReScanIncrementalSort(IncrementalSortState *node)
 	if (node->transfer_tuple != NULL)
 		ExecClearTuple(node->transfer_tuple);
 
-	node->bounded = false;
 	node->outerNodeDone = false;
 	node->n_fullsort_remaining = 0;
 	node->bound_Done = 0;
-- 
2.17.1

From 94e2c40913831de7870a46d90175db0388308691 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc...@gmail.com>
Date: Sun, 19 Apr 2020 12:13:09 -0400
Subject: [PATCH v2 1/3] Disable mark/restore for incremental sort

---
 src/backend/executor/execAmi.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..25760572d6 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -421,6 +421,14 @@ ExecSupportsMarkRestore(Path *pathnode)
 		case T_Sort:
 			return true;
 
+		case T_IncrementalSort:
+
+			/*
+			 * Unlike full sort, incremental sort keeps only a single group of
+			 * tuples in memory, so it can't mark and restore..
+			 */
+			return false;
+
 		case T_CustomScan:
 			{
 				CustomPath *customPath = castNode(CustomPath, pathnode);
-- 
2.17.1

From 76a05452cc3533854eac43cfd1fd21c7d03a12fc Mon Sep 17 00:00:00 2001
From: James Coleman <jtc...@gmail.com>
Date: Sun, 19 Apr 2020 12:13:33 -0400
Subject: [PATCH v2 2/3] Fixup EXEC_FLAG_REWIND for incremental sort

---
 src/backend/executor/nodeIncrementalSort.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 39ba11cdf7..2d2095964f 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -986,12 +986,15 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
 	SO_printf("ExecInitIncrementalSort: initializing sort node\n");
 
 	/*
-	 * Incremental sort can't be used with either EXEC_FLAG_REWIND,
-	 * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
-	 * batches in the current sort state.
+	 * Incremental sort can't be used with EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK,
+	 * because the current sort state contains only one sort batch rather than
+	 * the full result set.
+	 *
+	 * Note: we can't efficiently support EXEC_FLAG_REWIND either, but we don't
+	 * disallow it and instead re-execute if that flag is passed. For more
+	 * details see comments in ExecReScanIncrementalSort.
 	 */
-	Assert((eflags & (EXEC_FLAG_BACKWARD |
-					  EXEC_FLAG_MARK)) == 0);
+	Assert((eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)) == 0);
 
 	/* Initialize state structure. */
 	incrsortstate = makeNode(IncrementalSortState);
@@ -1044,7 +1047,7 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
 	 * We shield the child node from the need to support REWIND, BACKWARD, or
 	 * MARK/RESTORE.
 	 */
-	eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
+	eflags &= ~(EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
 
 	outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
 
@@ -1126,7 +1129,11 @@ ExecReScanIncrementalSort(IncrementalSortState *node)
 	 * store all tuples at once for the full sort.
 	 *
 	 * So even if EXEC_FLAG_REWIND is set we just reset all of our state and
-	 * reexecute the sort along with the child node below us.
+	 * re-execute the sort along with the child node below us.
+	 *
+	 * Alternatively we could have the planner add a materialize node above us,
+	 * but that would likely decrease the value of incremental sort since we'd
+	 * end up fetching the whole result set.
 	 *
 	 * In theory if we've only filled the full sort with one batch (and haven't
 	 * reset it for a new batch yet) then we could efficiently rewind, but
-- 
2.17.1

Re: Incremental sorts and EXEC_FLAG_REWIND

Reply via email to