Over on [1] Pavel reports that PG16's performance is slower than PG14's for his ORDER BY aggregate query. This seems to be mainly down to how the costing works for Incremental Sort. Now, ever since 1349d279, the planner will make a plan that's ordered by the GROUP BY clause and add on the path key requirements for any ORDER BY aggregates. I had in mind that if someone already had tuned their schema to have a btree index on the GROUP BY clause to allow Group Aggregate to have presorted rows from the index that the planner would likely just take that index and add an Incremental Sort on top of that to obtain the rows in the order required for the aggregate functions.
The main reason for Pavel's reported regression is due to a cost "pessimism factor" that was added to incremental sorts where we multiply by 1.5 to reduce the chances of an Incremental Sort plan due to uncertainties around the number of tuples per presorted group. We assume each group will have the same number of tuples. If there's some skew then there will be a larger N factor in the qsort complexity of O(N * log2(N)). Over on [1], I'm proposing that we remove that pessimism factor. If we keep teaching the planner new tricks but cost them pessimistically then we're not taking full advantage of said new tricks. If you disagree with that, then best to raise it on [1]. On [1], in addition to removing the * 1.5 factor, I also propose that we add a new GUC named "enable_presorted_aggregate", which, if turned off will make the planner not request that the plan is also ordered by the aggregate function's ORDER BY / DISTINCT clause. The reason I think that this is required is that even with removing the pessimism factor from incremental sort, it's possible the planner will choose to perform a full sort rather than an incremental sort on top of some existing index which provides presorted input for only the query's GROUP BY clause. e.g. CREATE TABLE ab (a INT, b INT); CREATE INDEX ON ab(a); EXPLAIN SELECT a,array_agg(b ORDER BY b) FROM ab GROUP BY a; Previous to 1349d279, the planner, assuming there's a good amount of rows in the table, would be very likely to use the index for the GROUP BY then the executor would be left to do a sort on "b" within nodeAgg.c. The equivalent of that post-1349d279 is Index Scan -> Incremental Sort -> Group Aggregate (with presorted input). However, the planner could choose to: Seq Scan -> Sort (a,b) -> Group Aggregate (with presorted input). So this leaves an opportunity for the planner to choose a worse plan. Normally we add some enable_* GUC to leave an escape hatch when we add some new feature like this. I likely should have done that when I added 1349d279, but I didn't and I want to now. I mainly just shifted this discussion out of [1] as we normally like to debate GUC names and I feel that the discussion over on the other thread is buried a little too deep to be visible to most people. Can anyone think of a better name? Or does anyone see error with my ambition? David [1] https://www.postgresql.org/message-id/9f61ddbf-2989-1536-b31e-6459370a6...@postgrespro.ru
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8e4145979d..9eedab652d 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5311,6 +5311,29 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class=" </listitem> </varlistentry> + <varlistentry id="guc-enable-presorted-aggregate" xreflabel="enable_presorted_aggregate"> + <term><varname>enable_presorted_aggregate</varname> (<type>boolean</type>) + <indexterm> + <primary><varname>enable_presorted_aggregate</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Controls if the query planner will produce a plan which will provide + rows which are presorted in the order required for the query's + <literal>ORDER BY</literal> / <literal>DISTINCT</literal> aggregate + functions. When disabled, the query planner will produce a plan which + will always require the executor to perform a sort before performing + aggregation of each aggregate function containing an + <literal>ORDER BY</literal> or <literal>DISTINCT</literal> clause. + When enabled, the planner will try to produce a more efficient plan + which provides input to the aggregate functions which is presorted in + the order they require for aggregation. The default value is + <literal>on</literal>. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan"> <term><varname>enable_seqscan</varname> (<type>boolean</type>) <indexterm> diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 0f0bcfb7e5..89d3c4352c 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -151,6 +151,7 @@ bool enable_partitionwise_aggregate = false; bool enable_parallel_append = true; bool enable_parallel_hash = true; bool enable_partition_pruning = true; +bool enable_presorted_aggregate = true; bool enable_async_append = true; typedef struct diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index dfda251d95..e21e72eb87 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -3191,7 +3191,8 @@ make_pathkeys_for_groupagg(PlannerInfo *root, List *groupClause, List *tlist, * sets. All handling specific to ordered aggregates must be done by the * executor in that case. */ - if (root->numOrderedAggs == 0 || root->parse->groupingSets != NIL) + if (root->numOrderedAggs == 0 || root->parse->groupingSets != NIL || + !enable_presorted_aggregate) return grouppathkeys; /* diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 1bf14eec66..436afe1d21 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -971,6 +971,21 @@ struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, + { + {"enable_presorted_aggregate", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables the planner's ability to produce plans which " + "provide presorted input for ORDER BY / DISTINCT aggregate " + "functions."), + gettext_noop("Allows the query planner to build plans which provide " + "presorted input for aggregate functions with an ORDER BY / " + "DISTINCT clause. When disabled, implicit sorts are always " + "performed during execution."), + GUC_EXPLAIN + }, + &enable_presorted_aggregate, + true, + NULL, NULL, NULL + }, { {"enable_async_append", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables the planner's use of async append plans."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 043864597f..5afdeb04de 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -384,6 +384,7 @@ #enable_partition_pruning = on #enable_partitionwise_join = off #enable_partitionwise_aggregate = off +#enable_presorted_aggregate = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h index 204e94b6d1..b6cc2c9cd6 100644 --- a/src/include/optimizer/cost.h +++ b/src/include/optimizer/cost.h @@ -68,6 +68,7 @@ extern PGDLLIMPORT bool enable_partitionwise_aggregate; extern PGDLLIMPORT bool enable_parallel_append; extern PGDLLIMPORT bool enable_parallel_hash; extern PGDLLIMPORT bool enable_partition_pruning; +extern PGDLLIMPORT bool enable_presorted_aggregate; extern PGDLLIMPORT bool enable_async_append; extern PGDLLIMPORT int constraint_exclusion; diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index fc2bd40be2..309c2dc865 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -1471,6 +1471,17 @@ group by ten; -> Seq Scan on tenk1 (5 rows) +-- Ensure no ordering is requested when enable_presorted_aggregate is off +set enable_presorted_aggregate to off; +explain (costs off) +select sum(two order by two) from tenk1; + QUERY PLAN +------------------------- + Aggregate + -> Seq Scan on tenk1 +(2 rows) + +reset enable_presorted_aggregate; -- -- Test combinations of DISTINCT and/or ORDER BY -- diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 579b861d84..001c6e7eb9 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -128,10 +128,11 @@ select name, setting from pg_settings where name like 'enable%'; enable_partition_pruning | on enable_partitionwise_aggregate | off enable_partitionwise_join | off + enable_presorted_aggregate | on enable_seqscan | on enable_sort | on enable_tidscan | on -(20 rows) +(21 rows) -- Test that the pg_timezone_names and pg_timezone_abbrevs views are -- more-or-less working. We can't test their contents in any great detail diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql index a4c00ff7a9..15f6be8e15 100644 --- a/src/test/regress/sql/aggregates.sql +++ b/src/test/regress/sql/aggregates.sql @@ -546,6 +546,12 @@ select from tenk1 group by ten; +-- Ensure no ordering is requested when enable_presorted_aggregate is off +set enable_presorted_aggregate to off; +explain (costs off) +select sum(two order by two) from tenk1; +reset enable_presorted_aggregate; + -- -- Test combinations of DISTINCT and/or ORDER BY --