Re: Consider the number of columns in the sort cost model

Andrei Lepikhov Tue, 22 Oct 2024 18:40:04 -0700

On 15/10/2024 12:15, David Rowley wrote:

On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepi...@gmail.com> wrote:
I think maybe what is worth working on is seeing if you can better
estimate the number of tiebreak comparisons by checking the number of
distinct values for each sort key.  That might require what we
discussed on another thread about having estimate_num_groups() better
use the n_distinct estimate from the EquivalenceMember with the least
distinct values.  It'll be another question if all that can be done
and this all still perform well enough for cost_sort(). You may have
to write it first before we can figure that out.  It may be possible
to buy back some of the overheads with some additional caching...
Perhaps that could be done within the EquivalenceClass.

It seems that your idea with an equivalence class works.

In the patchset attached, I show all the ideas (I initially wanted todiscuss it step-by-step).In the attached, the first patch is about choosing adequate expressionfrom the list of EC members.The second one is the current patch, which considers the number of sortcolumns.Third patch - elaboration of the second patch. Use only the trustfulstatistics here - the number of ndistinct values on the first sorted column.And the last patch is a demo of how I'm going to use the previous threepatches and add one more strategy to improve the order of columns in theGROUP-BY clause.


--
regards, Andrei Lepikhov

From b07075d68181df37e5d2d2d8f0435ac4c44ef80b Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Tue, 22 Oct 2024 09:02:02 +0400
Subject: [PATCH 1/4] Stabilise incremental sort cost calculation.

Carefully identify a column/expression that can represent the path key in cost
calculation of specific sort operator. Columns may have different numbers of
distinct values. That's why the order of columns in the sort operation may
impact number of the comparison function's calls.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find an
expression that chooses the most negligible ndistinct value.

TODO: Filtering EC members, external to this sort operator is not a big deal.
But in that case it would be necessary to pass underlying relids to cost
calculation routine that would cause the API change. So, here we stay as simple
as possible.

Add into EquivalenceMember the number of distinct values - em_ndistinct.
It may be additionally used later in groups number estimations.
---
 src/backend/optimizer/path/costsize.c         | 72 +++++++++++++++++--
 src/backend/optimizer/path/equivclass.c       |  1 +
 src/include/nodes/pathnodes.h                 |  2 +
 .../regress/expected/incremental_sort.out     | 51 +++++++++++++
 src/test/regress/sql/incremental_sort.sql     | 31 ++++++++
 5 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 79991b1980..325acdacbf 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -192,6 +192,8 @@ static int32 get_expr_width(PlannerInfo *root, const Node 
*expr);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
+static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
+                                                                               
                 EquivalenceClass *ec);
 
 
 /*
@@ -2016,22 +2018,21 @@ cost_incremental_sort(Path *path,
         */
        foreach(l, pathkeys)
        {
-               PathKey    *key = (PathKey *) lfirst(l);
-               EquivalenceMember *member = (EquivalenceMember *)
-                       linitial(key->pk_eclass->ec_members);
+               PathKey                    *key = (PathKey *) lfirst(l);
+               EquivalenceMember  *em = identify_sort_ecmember(root, 
key->pk_eclass);
 
                /*
                 * Check if the expression contains Var with "varno 0" so that 
we
                 * don't call estimate_num_groups in that case.
                 */
-               if (bms_is_member(0, pull_varnos(root, (Node *) 
member->em_expr)))
+               if (bms_is_member(0, pull_varnos(root, (Node *) em->em_expr)))
                {
                        unknown_varno = true;
                        break;
                }
 
                /* expression not containing any Vars with "varno 0" */
-               presortedExprs = lappend(presortedExprs, member->em_expr);
+               presortedExprs = lappend(presortedExprs, em->em_expr);
 
                if (foreach_current_index(l) + 1 >= presorted_keys)
                        break;
@@ -6491,3 +6492,64 @@ compute_gather_rows(Path *path)
 
        return clamp_row_est(path->rows * get_parallel_divisor(path));
 }
+
+/*
+ * Find suitable member of the equivalence class.
+ * Passing through the list of EC members find the member with minimum of
+ * distinct values. Cache estimated number of distincts in the em_ndistinct
+ * field of each member.
+ */
+static EquivalenceMember *
+identify_sort_ecmember(PlannerInfo *root, EquivalenceClass *ec)
+{
+       EquivalenceMember  *candidate = linitial(ec->ec_members);
+
+       if (root == NULL)
+               /* Fast path */
+               return candidate;
+
+       foreach_node(EquivalenceMember, em, ec->ec_members)
+       {
+               VariableStatData        vardata;
+
+               if (em->em_ndistinct == 0.)
+                       /* Nothing helpful */
+                       continue;
+
+               if (em->em_is_child || em->em_is_const || 
bms_is_empty(em->em_relids) ||
+                       bms_is_member(0, em->em_relids))
+               {
+                       em->em_ndistinct = 0.;
+                       continue;
+               }
+
+               if (em->em_ndistinct < 0.0)
+               {
+                       bool    isdefault = true;
+                       double  ndist;
+
+                       /* Let's check candidate's ndistinct value */
+                       examine_variable(root, (Node *) em->em_expr, 0, 
&vardata);
+                       if (HeapTupleIsValid(vardata.statsTuple))
+                               ndist = get_variable_numdistinct(&vardata, 
&isdefault);
+                       ReleaseVariableStats(vardata);
+
+                       if (isdefault)
+                       {
+                               em->em_ndistinct = 0.;
+                               continue;
+                       }
+
+                       em->em_ndistinct = ndist;
+               }
+
+               Assert(em->em_ndistinct > 0.);
+
+               if (candidate->em_ndistinct == 0. ||
+                       em->em_ndistinct < candidate->em_ndistinct)
+                       candidate = em;
+       }
+
+       Assert(candidate != NULL);
+       return candidate;
+}
\ No newline at end of file
diff --git a/src/backend/optimizer/path/equivclass.c 
b/src/backend/optimizer/path/equivclass.c
index 51d806326e..2894919b24 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -526,6 +526,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids 
relids,
        em->em_datatype = datatype;
        em->em_jdomain = jdomain;
        em->em_parent = parent;
+       em->em_ndistinct = -1.0;
 
        if (bms_is_empty(relids))
        {
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1ac1..3cb6a8a44f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1439,6 +1439,8 @@ typedef struct EquivalenceMember
        JoinDomain *em_jdomain;         /* join domain containing the source 
clause */
        /* if em_is_child is true, this links to corresponding EM for top 
parent */
        struct EquivalenceMember *em_parent pg_node_attr(read_write_ignore);
+
+       double em_ndistinct; /* cached value of ndistinct: 0- default value or 
'unknown'; -1 - not defined yet */
 } EquivalenceMember;
 
 /*
diff --git a/src/test/regress/expected/incremental_sort.out 
b/src/test/regress/expected/incremental_sort.out
index 5fd54a10b1..77ac213910 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1698,3 +1698,54 @@ explain (costs off) select a, b, a <-> point(5, 5) dist 
from point_table order b
                Order By: (a <-> '(5,5)'::point)
 (6 rows)
 
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+SET enable_hashjoin = 'off';
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
diff --git a/src/test/regress/sql/incremental_sort.sql 
b/src/test/regress/sql/incremental_sort.sql
index ab471bdfff..1334371e88 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -292,3 +292,34 @@ create index point_table_a_idx on point_table using 
gist(a);
 -- Ensure we get an incremental sort plan for both of the following queries
 explain (costs off) select a, b, a <-> point(5, 5) dist from point_table order 
by dist, b limit 1;
 explain (costs off) select a, b, a <-> point(5, 5) dist from point_table order 
by dist, b desc limit 1;
+
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+
+SET enable_hashjoin = 'off';
+
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
-- 
2.47.0

From eaae44d6e9dda4a295433af4ec6ae18e181046f7 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Mon, 21 Oct 2024 15:21:34 +0400
Subject: [PATCH 2/4] Consider the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 15 +++--
 src/backend/optimizer/path/costsize.c         | 22 +++++--
 src/test/regress/expected/aggregates.out      | 13 ++--
 src/test/regress/expected/join.out            | 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out           | 66 +++++++++----------
 6 files changed, 78 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 39b2b317e8..89ff865992 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9984,13 +9984,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 
ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 
10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-                                                                               
                                      QUERY PLAN                                
                                                                                
     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                                               
     QUERY PLAN                                                                 
                    
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN 
public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) 
WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC 
NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+         Output: t1.a, fprt2.b, fprt2.c
+         Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+         Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT 
JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 
10)))) WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 
10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 325acdacbf..8d50905ba8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -483,6 +483,8 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
        Cost            comparison_cost;
        double          N;
        double          logN;
+       int                     npathkeys = list_length(((Path *) 
path)->pathkeys);
+       double          cmpfrac = (npathkeys == 0) ? 2.0 : npathkeys + 1.0;
 
        /* Mark the path with the correct row estimate */
        if (rows)
@@ -505,7 +507,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
        logN = LOG2(N);
 
        /* Assumed cost per tuple comparison */
-       comparison_cost = 2.0 * cpu_operator_cost;
+       comparison_cost = cmpfrac * cpu_operator_cost;
 
        /* Heap creation cost */
        startup_cost += comparison_cost * N * logN;
@@ -1863,7 +1865,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path 
*rterm)
  */
 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-                          double tuples, int width,
+                          double tuples, int width, double cmpfrac,
                           Cost comparison_cost, int sort_mem,
                           double limit_tuples)
 {
@@ -1880,7 +1882,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
                tuples = 2.0;
 
        /* Include the default cost-per-comparison */
-       comparison_cost += 2.0 * cpu_operator_cost;
+       comparison_cost += cmpfrac * cpu_operator_cost;
 
        /* Do we have a useful LIMIT? */
        if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2051,7 +2053,9 @@ cost_incremental_sort(Path *path,
         * are equal.
         */
        cost_tuplesort(&group_startup_cost, &group_run_cost,
-                                  group_tuples, width, comparison_cost, 
sort_mem,
+                                  group_tuples, width,
+                                  list_length(pathkeys) + 1.0,
+                                  comparison_cost, sort_mem,
                                   limit_tuples);
 
        /*
@@ -2075,7 +2079,7 @@ cost_incremental_sort(Path *path,
         * detect the sort groups. This is roughly equal to one extra copy and
         * comparison per tuple.
         */
-       run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+       run_cost += (cpu_tuple_cost + (presorted_keys + 1) * comparison_cost) * 
input_tuples;
 
        /*
         * Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2109,9 +2113,11 @@ cost_sort(Path *path, PlannerInfo *root,
 {
        Cost            startup_cost;
        Cost            run_cost;
+       double          cmpfrac =
+                                               (pathkeys == NIL) ? 2.0 : 
list_length(pathkeys) + 1.0;
 
        cost_tuplesort(&startup_cost, &run_cost,
-                                  tuples, width,
+                                  tuples, width, cmpfrac,
                                   comparison_cost, sort_mem,
                                   limit_tuples);
 
@@ -2391,6 +2397,8 @@ cost_merge_append(Path *path, PlannerInfo *root,
        Cost            comparison_cost;
        double          N;
        double          logN;
+       double          cmpfrac =
+                                               (pathkeys == NIL) ? 2.0 : 
list_length(pathkeys) + 1.0;
 
        /*
         * Avoid log(0)...
@@ -2399,7 +2407,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
        logN = LOG2(N);
 
        /* Assumed cost per tuple comparison */
-       comparison_cost = 2.0 * cpu_operator_cost;
+       comparison_cost = cmpfrac * cpu_operator_cost;
 
        /* Heap creation cost */
        startup_cost += comparison_cost * N * logN;
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index a5596ab210..45e08457df 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3001,17 +3001,18 @@ ANALYZE agg_sort_order;
 EXPLAIN (COSTS OFF)
 SELECT array_agg(c1 ORDER BY c2),c2
 FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
-                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: c2
    ->  GroupAggregate
          Group Key: c1
-         ->  Sort
+         ->  Incremental Sort
                Sort Key: c1, c2
-               ->  Index Scan using agg_sort_order_c2_idx on agg_sort_order
-                     Index Cond: (c2 < 100)
-(8 rows)
+               Presorted Key: c1
+               ->  Index Scan using agg_sort_order_pkey on agg_sort_order
+                     Filter: (c2 < 100)
+(9 rows)
 
 DROP TABLE agg_sort_order CASCADE;
 DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index 53f70d72ed..c729b68ad3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5726,18 +5726,20 @@ select d.* from d left join (select * from b group by 
b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
+(11 rows)
 
 -- join removal is not possible here
 explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out 
b/src/test/regress/expected/partition_join.out
index 53591a4f2d..6b0303ce52 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1235,9 +1235,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 
t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c 
= 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
- Sort
+ Incremental Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Append
+   Presorted Key: t1.a
+   ->  Merge Append
+         Sort Key: t1.a
          ->  Merge Left Join
                Merge Cond: (t1_1.a = t2_1.b)
                ->  Sort
@@ -1286,7 +1288,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM 
(prt1 t1 LEFT JOIN prt2 t2
                ->  Sort
                      Sort Key: t2_3.b
                      ->  Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 
t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c 
= 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
diff --git a/src/test/regress/expected/union.out 
b/src/test/regress/expected/union.out
index 0fd0e1c38b..b08cb7f0a8 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1224,18 +1224,17 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x < 4
 ORDER BY x;
-                    QUERY PLAN                    
---------------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Sort
    Sort Key: (2)
-   ->  Unique
-         ->  Sort
-               Sort Key: (1), (2)
-               ->  Append
-                     ->  Result
-                     ->  Result
-                           One-Time Filter: false
-(9 rows)
+   ->  HashAggregate
+         Group Key: (1), (2)
+         ->  Append
+               ->  Result
+               ->  Result
+                     One-Time Filter: false
+(8 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, 2 AS x
@@ -1289,19 +1288,18 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x > 3
 ORDER BY x;
-                                     QUERY PLAN                                
     
-------------------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Sort
    Sort Key: ss.x
    ->  Subquery Scan on ss
          Filter: (ss.x > 3)
-         ->  Unique
-               ->  Sort
-                     Sort Key: (1), (((random() * '3'::double 
precision))::integer)
-                     ->  Append
-                           ->  Result
-                           ->  Result
-(10 rows)
+         ->  HashAggregate
+               Group Key: (1), (((random() * '3'::double precision))::integer)
+               ->  Append
+                     ->  Result
+                     ->  Result
+(9 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, (random()*3)::int AS x
@@ -1322,24 +1320,22 @@ select distinct q1 from
    union all
    select distinct * from int8_tbl i82) ss
 where q2 = q2;
-                        QUERY PLAN                        
-----------------------------------------------------------
- Unique
-   ->  Merge Append
-         Sort Key: "*SELECT* 1".q1
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: "*SELECT* 1".q1
+   ->  Append
          ->  Subquery Scan on "*SELECT* 1"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i81.q1, i81.q2
-                           ->  Seq Scan on int8_tbl i81
-                                 Filter: (q2 IS NOT NULL)
+               ->  HashAggregate
+                     Group Key: i81.q1, i81.q2
+                     ->  Seq Scan on int8_tbl i81
+                           Filter: (q2 IS NOT NULL)
          ->  Subquery Scan on "*SELECT* 2"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i82.q1, i82.q2
-                           ->  Seq Scan on int8_tbl i82
-                                 Filter: (q2 IS NOT NULL)
-(15 rows)
+               ->  HashAggregate
+                     Group Key: i82.q1, i82.q2
+                     ->  Seq Scan on int8_tbl i82
+                           Filter: (q2 IS NOT NULL)
+(13 rows)
 
 select distinct q1 from
   (select distinct * from int8_tbl i81
-- 
2.47.0

From cfb5973ac86aa32d12bedf472256ad357eaf5fec Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Tue, 22 Oct 2024 16:54:49 +0400
Subject: [PATCH 3/4] Consider ndistinct on the first column in cost_sort().

---
 src/backend/optimizer/path/costsize.c        | 46 ++++++++++++++++++--
 src/test/regress/expected/aggregates.out     | 24 +++++-----
 src/test/regress/expected/partition_join.out | 18 ++++----
 3 files changed, 63 insertions(+), 25 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 8d50905ba8..8935185a5e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -194,6 +194,8 @@ static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
 static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
                                                                                
                 EquivalenceClass *ec);
+static double sort_comparisons_factor(PlannerInfo *root, List *pathkeys,
+                                                                         
double ntuples);
 
 
 /*
@@ -2113,8 +2115,7 @@ cost_sort(Path *path, PlannerInfo *root,
 {
        Cost            startup_cost;
        Cost            run_cost;
-       double          cmpfrac =
-                                               (pathkeys == NIL) ? 2.0 : 
list_length(pathkeys) + 1.0;
+       double          cmpfrac = sort_comparisons_factor(root, pathkeys, 
tuples);
 
        cost_tuplesort(&startup_cost, &run_cost,
                                   tuples, width, cmpfrac,
@@ -6560,4 +6561,43 @@ identify_sort_ecmember(PlannerInfo *root, 
EquivalenceClass *ec)
 
        Assert(candidate != NULL);
        return candidate;
-}
\ No newline at end of file
+}
+
+/*
+ * Calculate multiplier reflecting the number of comparisons which executor
+ * have to perform during the sort with this specific order of columns.
+ *
+ * The comparison factor f = 1.+F(pathkeys). There 1. incapsulates the
+ * second-order of significance phusics which cost function doesn't consider.
+ * F(pathkeys) is the estimated fraction of comparisons in the range [1..N].
+ * F = 1 corresponds the 'all-unique' first column case. In that case the sort
+ * will call comparison function only once for each couple of tuples.
+ * F = N represents the case, when values in all columns are constant.
+ */
+static double
+sort_comparisons_factor(PlannerInfo *root, List *pathkeys, double ntuples)
+{
+       int             n = list_length(pathkeys);
+       double  cmpfrac = (n == 0) ? 2.0 : n + 1;
+
+       if (root != NULL && ntuples > 1 && n > 1)
+       {
+               PathKey                    *key = linitial_node(PathKey, 
pathkeys);
+               EquivalenceMember  *em =  identify_sort_ecmember(root, 
key->pk_eclass);
+
+               Assert(em->em_ndistinct >= 0);
+
+               if (em->em_ndistinct == 0.)
+                       /*
+                        * Optimiser doesn't have an info on ndistinct value, 
return
+                        * extreme case
+                        */
+                       return cmpfrac;
+
+               if (ntuples >= em->em_ndistinct)
+                       cmpfrac =
+                               2.0 + ((ntuples - em->em_ndistinct) / (ntuples 
- 1)) * (n - 1);
+       }
+
+       return cmpfrac;
+}
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 45e08457df..8c05adff86 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2842,19 +2842,18 @@ SELECT count(*)
                                   QUERY PLAN                                   
 -------------------------------------------------------------------------------
  GroupAggregate
-   Group Key: t1.z, t1.w, t1.x, t1.y
-   ->  Incremental Sort
-         Sort Key: t1.z, t1.w, t1.x, t1.y
-         Presorted Key: t1.z, t1.w, t1.x
+   Group Key: t1.x, t1.y, t1.z, t1.w
+   ->  Sort
+         Sort Key: t1.x, t1.y, t1.z, t1.w
          ->  Merge Join
-               Merge Cond: ((t1.z = t2.z) AND (t1.w = t2.w) AND (t1.x = t2.x))
+               Merge Cond: ((t1.w = t2.w) AND (t1.z = t2.z) AND (t1.x = t2.x))
                ->  Sort
-                     Sort Key: t1.z, t1.w, t1.x
+                     Sort Key: t1.w, t1.z, t1.x
                      ->  Index Scan using btg_x_y_idx on btg t1
                ->  Sort
-                     Sort Key: t2.z, t2.w, t2.x
+                     Sort Key: t2.w, t2.z, t2.x
                      ->  Index Scan using btg_x_y_idx on btg t2
-(13 rows)
+(12 rows)
 
 RESET enable_nestloop;
 RESET enable_hashjoin;
@@ -2878,12 +2877,11 @@ SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER BY 
x*x, z;
  Sort
    Sort Key: ((x * x)), z
    ->  GroupAggregate
-         Group Key: x, y, w, z
-         ->  Incremental Sort
-               Sort Key: x, y, w, z
-               Presorted Key: x, y
+         Group Key: w, x, y, z
+         ->  Sort
+               Sort Key: w, x, y, z
                ->  Index Scan using btg_x_y_idx on btg
-(8 rows)
+(7 rows)
 
 -- Test the case where the number of incoming subtree path keys is more than
 -- the number of grouping keys.
diff --git a/src/test/regress/expected/partition_join.out 
b/src/test/regress/expected/partition_join.out
index 6b0303ce52..1e6b278d21 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -615,39 +615,39 @@ SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
                ->  Sort
                      Sort Key: (COALESCE(prt1.a, p2.a)), (COALESCE(prt1.b, 
p2.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1.a = p2.a) AND (prt1.b = p2.b))
+                           Merge Cond: ((prt1.b = p2.b) AND (prt1.a = p2.a))
                            Filter: ((COALESCE(prt1.a, p2.a) >= 490) AND 
(COALESCE(prt1.a, p2.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1.a, prt1.b
+                                 Sort Key: prt1.b, prt1.a
                                  ->  Seq Scan on prt1_p1 prt1
                            ->  Sort
-                                 Sort Key: p2.a, p2.b
+                                 Sort Key: p2.b, p2.a
                                  ->  Seq Scan on prt2_p1 p2
          ->  Group
                Group Key: (COALESCE(prt1_1.a, p2_1.a)), (COALESCE(prt1_1.b, 
p2_1.b))
                ->  Sort
                      Sort Key: (COALESCE(prt1_1.a, p2_1.a)), 
(COALESCE(prt1_1.b, p2_1.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = 
p2_1.b))
+                           Merge Cond: ((prt1_1.b = p2_1.b) AND (prt1_1.a = 
p2_1.a))
                            Filter: ((COALESCE(prt1_1.a, p2_1.a) >= 490) AND 
(COALESCE(prt1_1.a, p2_1.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1_1.a, prt1_1.b
+                                 Sort Key: prt1_1.b, prt1_1.a
                                  ->  Seq Scan on prt1_p2 prt1_1
                            ->  Sort
-                                 Sort Key: p2_1.a, p2_1.b
+                                 Sort Key: p2_1.b, p2_1.a
                                  ->  Seq Scan on prt2_p2 p2_1
          ->  Group
                Group Key: (COALESCE(prt1_2.a, p2_2.a)), (COALESCE(prt1_2.b, 
p2_2.b))
                ->  Sort
                      Sort Key: (COALESCE(prt1_2.a, p2_2.a)), 
(COALESCE(prt1_2.b, p2_2.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = 
p2_2.b))
+                           Merge Cond: ((prt1_2.b = p2_2.b) AND (prt1_2.a = 
p2_2.a))
                            Filter: ((COALESCE(prt1_2.a, p2_2.a) >= 490) AND 
(COALESCE(prt1_2.a, p2_2.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1_2.a, prt1_2.b
+                                 Sort Key: prt1_2.b, prt1_2.a
                                  ->  Seq Scan on prt1_p3 prt1_2
                            ->  Sort
-                                 Sort Key: p2_2.a, p2_2.b
+                                 Sort Key: p2_2.b, p2_2.a
                                  ->  Seq Scan on prt2_p3 p2_2
 (43 rows)
 
-- 
2.47.0

From caf8eb1706994029308bfd29c8bfce5de17dbff1 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Mon, 21 Oct 2024 16:17:01 +0400
Subject: [PATCH 4/4] Add GROUP-BY strategy: put the most distinct column at
 the head.

Let's allow GROUP-BY to utilize cost_sort feature which can differentiate
orders of pathkeys lists according to the ndistinct of the first column.
---
 src/backend/optimizer/path/pathkeys.c    | 73 +++++++++++++++++++++++-
 src/test/regress/expected/aggregates.out | 43 +++++++-------
 src/test/regress/sql/aggregates.sql      | 10 ++--
 3 files changed, 96 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index e25798972f..7db855fc39 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -26,6 +26,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
 
 /* Consider reordering of GROUP BY keys? */
 bool           enable_group_by_reordering = true;
@@ -471,6 +472,10 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
        List       *pathkeys = root->group_pathkeys;
        List       *clauses = root->processed_groupClause;
 
+       double          nd_max = 0;
+       PathKey    *pk_opt = NULL;
+       ListCell   *lc1, *lc2;
+
        /* always return at least the original pathkeys/clauses */
        info = makeNode(GroupByOrdering);
        info->pathkeys = pathkeys;
@@ -524,9 +529,6 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
                /* Test consistency of info structures */
                for_each_from(lc, infos, 1)
                {
-                       ListCell   *lc1,
-                                          *lc2;
-
                        info = lfirst_node(GroupByOrdering, lc);
 
                        Assert(list_length(info->clauses) == 
list_length(pinfo->clauses));
@@ -544,6 +546,71 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
                }
        }
 #endif
+
+       /*
+        * Let's try the order with the column having max ndistinct value
+        */
+
+       forboth(lc1, root->group_pathkeys, lc2, root->processed_groupClause)
+       {
+               PathKey                    *pkey = lfirst_node(PathKey, lc1);
+               SortGroupClause    *gc = (SortGroupClause *) lfirst(lc2);
+               Node                       *node;
+               Bitmapset                  *relids;
+               VariableStatData        vardata;
+               double                          nd = -1;
+               bool                            isdefault;
+
+               if (foreach_current_index(lc1) >= root->num_groupby_pathkeys)
+                       break;
+
+               node = get_sortgroupclause_expr(gc, root->parse->targetList);
+               relids = pull_varnos(root, node);
+
+               if (bms_num_members(relids) != 1 && bms_is_member(0, relids))
+                       /*
+                        *Although functional index can estimate distincts 
here, the chance
+                        * is too low.
+                        */
+                       continue;
+
+               examine_variable(root, node, 0, &vardata);
+               if (!HeapTupleIsValid(vardata.statsTuple))
+                       continue;
+               nd = get_variable_numdistinct(&vardata, &isdefault);
+               ReleaseVariableStats(vardata);
+               if (isdefault)
+                       continue;
+
+               Assert(nd >= 0);
+               if (nd_max == 0 || nd > nd_max)
+               {
+                       nd_max = nd;
+                       pk_opt = pkey;
+               }
+       }
+
+       if (pk_opt != NULL)
+       {
+               List   *new_pathkeys = list_make1(pk_opt);
+               int             n;
+
+               new_pathkeys = list_concat_unique_ptr(new_pathkeys, 
root->group_pathkeys);
+               n = group_keys_reorder_by_pathkeys(new_pathkeys, &pathkeys, 
&clauses,
+                                                                               
   root->num_groupby_pathkeys);
+
+               if (n > 0 &&
+                       (enable_incremental_sort || n == 
root->num_groupby_pathkeys) &&
+                       compare_pathkeys(pathkeys, root->group_pathkeys) != 
PATHKEYS_EQUAL)
+               {
+                       info = makeNode(GroupByOrdering);
+                       info->pathkeys = pathkeys;
+                       info->clauses = clauses;
+
+                       infos = lappend(infos, info);
+               }
+       }
+
        return infos;
 }
 
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 8c05adff86..4af82f9600 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2786,13 +2786,13 @@ SELECT balk(hundred) FROM tenk1;
 ROLLBACK;
 -- GROUP BY optimization by reordering GROUP BY clauses
 CREATE TABLE btg AS SELECT
-  i % 10 AS x,
-  i % 10 AS y,
-  'abc' || i % 10 AS z,
+  i % 231 AS x,
+  i % 49 AS y,
+  'abc' || i % 2 AS z,
   i AS w
-FROM generate_series(1, 100) AS i;
+FROM generate_series(1, 1000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
-ANALYZE btg;
+VACUUM ANALYZE btg;
 SET enable_hashagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -2839,21 +2839,19 @@ EXPLAIN (COSTS OFF)
 SELECT count(*)
   FROM btg t1 JOIN btg t2 ON t1.z = t2.z AND t1.w = t2.w AND t1.x = t2.x
   GROUP BY t1.x, t1.y, t1.z, t1.w;
-                                  QUERY PLAN                                   
--------------------------------------------------------------------------------
+                           QUERY PLAN                           
+----------------------------------------------------------------
  GroupAggregate
-   Group Key: t1.x, t1.y, t1.z, t1.w
+   Group Key: t1.w, t1.x, t1.y, t1.z
    ->  Sort
-         Sort Key: t1.x, t1.y, t1.z, t1.w
+         Sort Key: t1.w, t1.x, t1.y, t1.z
          ->  Merge Join
-               Merge Cond: ((t1.w = t2.w) AND (t1.z = t2.z) AND (t1.x = t2.x))
-               ->  Sort
-                     Sort Key: t1.w, t1.z, t1.x
-                     ->  Index Scan using btg_x_y_idx on btg t1
-               ->  Sort
-                     Sort Key: t2.w, t2.z, t2.x
+               Merge Cond: (t1.x = t2.x)
+               Join Filter: ((t2.z = t1.z) AND (t2.w = t1.w))
+               ->  Index Scan using btg_x_y_idx on btg t1
+               ->  Materialize
                      ->  Index Scan using btg_x_y_idx on btg t2
-(12 rows)
+(10 rows)
 
 RESET enable_nestloop;
 RESET enable_hashjoin;
@@ -2877,11 +2875,12 @@ SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER BY 
x*x, z;
  Sort
    Sort Key: ((x * x)), z
    ->  GroupAggregate
-         Group Key: w, x, y, z
-         ->  Sort
-               Sort Key: w, x, y, z
+         Group Key: x, y, w, z
+         ->  Incremental Sort
+               Sort Key: x, y, w, z
+               Presorted Key: x, y
                ->  Index Scan using btg_x_y_idx on btg
-(7 rows)
+(8 rows)
 
 -- Test the case where the number of incoming subtree path keys is more than
 -- the number of grouping keys.
@@ -2918,9 +2917,9 @@ GROUP BY c1.w, c1.z;
                      QUERY PLAN                      
 -----------------------------------------------------
  GroupAggregate
-   Group Key: c1.w, c1.z
+   Group Key: c1.z, c1.w
    ->  Sort
-         Sort Key: c1.w, c1.z, c1.x, c1.y
+         Sort Key: c1.z, c1.w, c1.x, c1.y
          ->  Merge Join
                Merge Cond: (c1.x = c2.x)
                ->  Sort
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..f1a9a607b7 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1199,13 +1199,13 @@ ROLLBACK;
 
 -- GROUP BY optimization by reordering GROUP BY clauses
 CREATE TABLE btg AS SELECT
-  i % 10 AS x,
-  i % 10 AS y,
-  'abc' || i % 10 AS z,
+  i % 231 AS x,
+  i % 49 AS y,
+  'abc' || i % 2 AS z,
   i AS w
-FROM generate_series(1, 100) AS i;
+FROM generate_series(1, 1000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
-ANALYZE btg;
+VACUUM ANALYZE btg;
 
 SET enable_hashagg = off;
 SET enable_seqscan = off;
-- 
2.47.0

Re: Consider the number of columns in the sort cost model

Reply via email to