Re: [PATCH] Incremental sort (was: PoC: Partial sort)

James Coleman Sun, 19 Apr 2020 06:49:13 -0700

On Sat, Apr 18, 2020 at 10:36 PM Justin Pryzby <[email protected]> wrote:
>
> On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote:
> > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > > And, should it use two spaces before "Sort Method", "Memory" and 
> > > > "Pre-sorted
> > ...
> > > I read through that subthread, and the ending seemed to be Peter
> > > wanting things to be unified. Was there a conclusion beyond that?
> >
> > This discussion is ongoing.  I think let's wait until that's settled before
> > addressing this more complex and even newer case.  We can add "explain, two
> > spaces and equals vs colon" to the "Open items" list if need be - I hope the
> > discussion will not delay the release.
>
> The change proposed on the WAL thread is minimal, and makes new explain(WAL)
> output consistent with the that of explain(BUFFERS).
>
> That uses a different format from "Sort", which is what incremental sort 
> should
> follow.  (Hashjoin also uses the Sort's format of two-spaces and colons rather
> than equals).


I think it's not great that buffers/sort are different, but I agree
that we should follow sort.

> So the attached 0001 makes explain output for incremental sort more consistent
> with sort:
>
>      - Two spaces;
>      - colons rather than equals;
>      - Don't use semicolon, which isn't in use anywhere else;
>
> I tested with this:
> template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t SELECT 
> a-(a%100), a%1000 FROM generate_series(1,99999)a; CREATE INDEX ON t(i); 
> VACUUM VERBOSE ANALYZE t;
> template1=# explain analyze SELECT * FROM t a ORDER BY i,j;
> ...
>    Full-sort Groups: 1000  Sort Method: quicksort  Average Memory: 28kB  Peak 
> Memory: 28kB  Pre-sorted Groups: 1000  Sort Method: quicksort  Average 
> Memory: 30kB  Peak Memory: 30kB
>
> On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote:
> > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby <[email protected]> 
> > > wrote:
> > > > Should "Pre-sorted Groups:" be on a separate line ?
> > > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
> > > > Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB
> > >
> > > I'd originally had that, but Tomas wanted it to be more compact. It's
> > > easy to adjust though if the consensus changes on that.
> >
> > I'm OK with changing the format if there's a consensus. The current
> > format seemed better to me, but I'm not particularly attached to it.
>
> I still think Pre-sorted groups should be on a separate line, as in 0002.
> In addition to looking better (to me), and being easier to read, another 
> reason
> is that there are essentially key=>values here, but the keys are repeated 
> (Sort
> Method, etc).

I collapsed this into 0001 because I think that if we're going to do
away with the key=value style then we effectively to have to do this
to avoid the repeated values being confusing (with key=value it kinda
worked, because that made it seem like the avg/peak were clearly a
subset of the Sort Groups info).

I also cleaned up the newline patch a bit in the process (we already
have a way to indent with a flag so don't need to do it directly).

> I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that 
> here.

I went ahead and did that too; we already use "Full-sort", so the
proposed change makes both parallel.

> Michael already patched most of the comment typos, the remainder I'm including
> here as a "nearby patch"..

Modified slightly.

James

From becd60ba348558fa241db5cc2100a84b757cbdc5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <[email protected]>
Date: Mon, 6 Apr 2020 17:37:31 -0500
Subject: [PATCH v2 2/2] comment typos: Incremental Sort

commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da
Author: Tomas Vondra <[email protected]>

Previously reported here:
https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com
---
 src/backend/commands/explain.c             |  4 ++--
 src/backend/executor/nodeIncrementalSort.c | 10 ++++++----
 src/backend/utils/sort/tuplesort.c         |  8 ++++----
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5f91c569a0..86c10458f4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2865,7 +2865,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 }
 
 /*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for an incremental sort node
  */
 static void
 show_incremental_sort_info(IncrementalSortState *incrsortstate,
@@ -2913,7 +2913,7 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
 			&incrsortstate->shared_info->sinfo[n];
 
 			/*
-			 * If a worker hasn't process any sort groups at all, then exclude
+			 * If a worker hasn't processed any sort groups at all, then exclude
 			 * it from output since it either didn't launch or didn't
 			 * contribute anything meaningful.
 			 */
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 39ba11cdf7..05c60ec3e0 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -987,8 +987,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
 
 	/*
 	 * Incremental sort can't be used with either EXEC_FLAG_REWIND,
-	 * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
-	 * batches in the current sort state.
+	 * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because the current sort state
+	 * contains only one sort batch rather than the full result set.
 	 */
 	Assert((eflags & (EXEC_FLAG_BACKWARD |
 					  EXEC_FLAG_MARK)) == 0);
@@ -1153,8 +1153,10 @@ ExecReScanIncrementalSort(IncrementalSortState *node)
 	/*
 	 * If we've set up either of the sort states yet, we need to reset them.
 	 * We could end them and null out the pointers, but there's no reason to
-	 * repay the setup cost, and because guard setting up pivot comparator
-	 * state similarly, doing so might actually cause a leak.
+	 * repay the setup cost, and because ExecIncrementalSort guards
+	 * presorted column functions by checking to see if the full sort state
+	 * has been initialized yet, setting the sort states to null here might
+	 * actually cause a leak.
 	 */
 	if (node->fullsort_state != NULL)
 	{
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index de38c6c7e0..d59e3d5a8d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1428,11 +1428,11 @@ tuplesort_updatemax(Tuplesortstate *state)
 	}
 
 	/*
-	 * Sort evicts data to the disk when it didn't manage to fit those data to
-	 * the main memory.  This is why we assume space used on the disk to be
+	 * Sort evicts data to the disk when it wasn't able to fit that data into
+	 * main memory.  This is why we assume space used on the disk to be
 	 * more important for tracking resource usage than space used in memory.
-	 * Note that amount of space occupied by some tuple set on the disk might
-	 * be less than amount of space occupied by the same tuple set in the
+	 * Note that the amount of space occupied by some tupleset on the disk might
+	 * be less than amount of space occupied by the same tupleset in
 	 * memory due to more compact representation.
 	 */
 	if ((isSpaceDisk && !state->isMaxSpaceDisk) ||
-- 
2.17.1

From 8e80be2b345c3940f76ffbd5e3c201a7ae855784 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <[email protected]>
Date: Wed, 15 Apr 2020 08:45:21 -0500
Subject: [PATCH v2 1/2] Fix explain output for incr sort:

 - Two spaces
 - colons rather than equals
 - Don't use semicolon
 - Put Pre-sorted groups on a separate line
 - Rename Presorted to Pre-sorted (to match Full-sort)
---
 src/backend/commands/explain.c                | 22 ++++++++-----------
 .../regress/expected/incremental_sort.out     | 21 +++++++++---------
 src/test/regress/sql/incremental_sort.sql     |  4 ++--
 3 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..5f91c569a0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2778,7 +2778,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 	{
 		if (indent)
 			appendStringInfoSpaces(es->str, es->indent * 2);
-		appendStringInfo(es->str, "%s Groups: " INT64_FORMAT " Sort Method", groupLabel,
+		appendStringInfo(es->str, "%s Groups: " INT64_FORMAT "  Sort Method", groupLabel,
 						 groupInfo->groupCount);
 		/* plural/singular based on methodNames size */
 		if (list_length(methodNames) > 1)
@@ -2798,24 +2798,20 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 			const char *spaceTypeName;
 
 			spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
-			appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
+			appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
 							 spaceTypeName, avgSpace,
-							 groupInfo->maxMemorySpaceUsed);
+							 spaceTypeName, groupInfo->maxMemorySpaceUsed);
 		}
 
 		if (groupInfo->maxDiskSpaceUsed > 0)
 		{
 			long		avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
-
 			const char *spaceTypeName;
 
 			spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
-			/* Add a semicolon separator only if memory stats were printed. */
-			if (groupInfo->maxMemorySpaceUsed > 0)
-				appendStringInfo(es->str, ";");
-			appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
+			appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
 							 spaceTypeName, avgSpace,
-							 groupInfo->maxDiskSpaceUsed);
+							 spaceTypeName, groupInfo->maxDiskSpaceUsed);
 		}
 	}
 	else
@@ -2899,8 +2895,8 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
 		if (prefixsortGroupInfo->groupCount > 0)
 		{
 			if (es->format == EXPLAIN_FORMAT_TEXT)
-				appendStringInfo(es->str, " ");
-			show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
+				appendStringInfo(es->str, "\n");
+			show_incremental_sort_group_info(prefixsortGroupInfo, "Pre-sorted", true, es);
 		}
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 			appendStringInfo(es->str, "\n");
@@ -2943,8 +2939,8 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
 			if (prefixsortGroupInfo->groupCount > 0)
 			{
 				if (es->format == EXPLAIN_FORMAT_TEXT)
-					appendStringInfo(es->str, " ");
-				show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
+					appendStringInfo(es->str, "\n");
+				show_incremental_sort_group_info(prefixsortGroupInfo, "Pre-sorted", true, es);
 			}
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 				appendStringInfo(es->str, "\n");
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 238d89a206..2b40a26d82 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -106,7 +106,7 @@ declare
   space_key text;
 begin
   for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
-    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
+    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
       for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
         node := jsonb_set(node, array[group_key, space_key, 'Average Sort Space Used'], '"NN"', false);
         node := jsonb_set(node, array[group_key, space_key, 'Peak Sort Space Used'], '"NN"', false);
@@ -128,7 +128,7 @@ declare
   space_key text;
 begin
   for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
-    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
+    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
       group_stats := node->group_key;
       for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
         if (group_stats->space_key->'Peak Sort Space Used')::bigint < (group_stats->space_key->'Peak Sort Space Used')::bigint then
@@ -533,13 +533,13 @@ select * from (select * from t order by a) s order by a, b limit 55;
 
 -- Test EXPLAIN ANALYZE with only a fullsort group.
 select explain_analyze_without_memory('select * from (select * from t order by a) s order by a, b limit 55');
-                                 explain_analyze_without_memory                                 
-------------------------------------------------------------------------------------------------
+                                        explain_analyze_without_memory                                         
+---------------------------------------------------------------------------------------------------------------
  Limit (actual rows=55 loops=1)
    ->  Incremental Sort (actual rows=55 loops=1)
          Sort Key: t.a, t.b
          Presorted Key: t.a
-         Full-sort Groups: 2 Sort Methods: top-N heapsort, quicksort Memory: avg=NNkB peak=NNkB
+         Full-sort Groups: 2  Sort Methods: top-N heapsort, quicksort  Average Memory: NNkB  Peak Memory: NNkB
          ->  Sort (actual rows=101 loops=1)
                Sort Key: t.a
                Sort Method: quicksort  Memory: NNkB
@@ -708,18 +708,19 @@ select * from t left join (select * from (select * from t order by a) v order by
 rollback;
 -- Test EXPLAIN ANALYZE with both fullsort and presorted groups.
 select explain_analyze_without_memory('select * from (select * from t order by a) s order by a, b limit 70');
-                                                                    explain_analyze_without_memory                                                                    
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                         explain_analyze_without_memory                                         
+----------------------------------------------------------------------------------------------------------------
  Limit (actual rows=70 loops=1)
    ->  Incremental Sort (actual rows=70 loops=1)
          Sort Key: t.a, t.b
          Presorted Key: t.a
-         Full-sort Groups: 1 Sort Method: quicksort Memory: avg=NNkB peak=NNkB Presorted Groups: 5 Sort Methods: top-N heapsort, quicksort Memory: avg=NNkB peak=NNkB
+         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: NNkB  Peak Memory: NNkB
+         Pre-sorted Groups: 5  Sort Methods: top-N heapsort, quicksort  Average Memory: NNkB  Peak Memory: NNkB
          ->  Sort (actual rows=1000 loops=1)
                Sort Key: t.a
                Sort Method: quicksort  Memory: NNkB
                ->  Seq Scan on t (actual rows=1000 loops=1)
-(9 rows)
+(10 rows)
 
 select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * from (select * from t order by a) s order by a, b limit 70'));
                   jsonb_pretty                   
@@ -747,7 +748,7 @@ select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * from
                  "Average Sort Space Used": "NN"+
              }                                  +
          },                                     +
-         "Presorted Groups": {                  +
+         "Pre-sorted Groups": {                 +
              "Group Count": 5,                  +
              "Sort Methods Used": [             +
                  "top-N heapsort",              +
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 2241cc9c02..6f70b36a81 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -82,7 +82,7 @@ declare
   space_key text;
 begin
   for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
-    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
+    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
       for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
         node := jsonb_set(node, array[group_key, space_key, 'Average Sort Space Used'], '"NN"', false);
         node := jsonb_set(node, array[group_key, space_key, 'Peak Sort Space Used'], '"NN"', false);
@@ -105,7 +105,7 @@ declare
   space_key text;
 begin
   for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
-    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
+    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
       group_stats := node->group_key;
       for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
         if (group_stats->space_key->'Peak Sort Space Used')::bigint < (group_stats->space_key->'Peak Sort Space Used')::bigint then
-- 
2.17.1

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Reply via email to