On 10/15/24 02:59, Kirill Reshke wrote:
On Thu, 22 Aug 2024 at 23:46, Andrei Lepikhov <lepi...@gmail.com> wrote:
Two questions:
Thank you for the interest to this feature!
1) Why do we change the `cost_sort` definition? We can calculate the
length of `pathkeys` inside cost_sort if we want, just like we do
inside cost_gather_merge.
I am suspicious of that but open to hearing other opinions. The
coefficient incorporates knowledge about how many comparisons will be
made with this sorting operator. The caller can have some additional
knowledge about that. For example, IncrementalSort knows about groups -
each group has the same prefix, which means each tuple comparison will
inevitably cause a comparison for each column from the prefix. Also, it
may be knowledge about the number of distinct values for each column,
etc. I'm not sure we should elaborate cost_sort a lot here.
2) I'm not very sure about the (1.0 + cmpMultiplier) coefficient. I
understand that we need to multiply by something that is not 0 (and
that's why we add 1), and this is strictly better than multiplying by
2, but why exactly is this formula like this, not (1.0 + cmpMultiplier
^ 2 / 10) for example? Maybe we need some more sophisticated approach?
Hmmm, I think you misunderstood the general idea behind this
coefficient. I explained it earlier this year in more detail [1],
showing the direction of future improvement.
Value '1' in this formula reflects some second-order significance
physics that we don't consider in that formula. Of course, it may be
different, but after tests, I decided to keep it simple.
The cmpMultiplier value reflects the number of comparisons the executor
will make when comparing a pair of tuples. It is a fraction of the
maximum number of potential comparisons for each operation of tuple
comparisons (likewise, Hash Agg already employs in costing). Right now,
we don't use distinct statistics on sorting columns and use a
conservative approach, which may be improved in the future. But I don't
think it is a real problem here because we use the same in all places
and, with this feature, change the balance only with hashing methods (in
favour of their usage). I think it is not harmful because this
characteristic is a second-order grade factor and shouldn't
significantly reduce the performance of actual queries.
Also, this patch needs to be rebased, again.
Thanks, done.
[1] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort
--
regards, Andrei Lepikhov
From 3ba39d245e079235f6352e5cb90224cc9ecee9b6 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Tue, 15 Oct 2024 11:33:01 +0700
Subject: [PATCH v3] Introduce the number of columns in the cost-sort model.
The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
.../postgres_fdw/expected/postgres_fdw.out | 15 +++--
contrib/postgres_fdw/postgres_fdw.c | 2 +-
src/backend/optimizer/path/costsize.c | 34 ++++++----
src/backend/optimizer/plan/createplan.c | 2 +-
src/backend/optimizer/plan/planner.c | 2 +-
src/backend/optimizer/prep/prepunion.c | 2 +-
src/backend/optimizer/util/pathnode.c | 8 +--
src/include/optimizer/cost.h | 2 +-
src/test/regress/expected/aggregates.out | 13 ++--
src/test/regress/expected/join.out | 20 +++---
src/test/regress/expected/partition_join.out | 8 ++-
src/test/regress/expected/union.out | 66 +++++++++----------
12 files changed, 95 insertions(+), 79 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f2bcd6aa98..5da33ab69e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
-- left outer join + nullable clause
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
Output: t1.a, fprt2.b, fprt2.c
- Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
- Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+ Sort Key: t1.a, fprt2.b, fprt2.c
+ -> Foreign Scan
+ Output: t1.a, fprt2.b, fprt2.c
+ Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+ Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10))
+(7 rows)
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
a | b | c
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..f9a2e88a17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3667,7 +3667,7 @@ adjust_foreign_grouping_path_cost(PlannerInfo *root,
cost_sort(&sort_path,
root,
- pathkeys,
+ list_length(pathkeys),
0,
*p_startup_cost + *p_run_cost,
retrieved_rows,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2bb6db1df7..7c4e39065b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -493,6 +493,10 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
Cost comparison_cost;
double N;
double logN;
+ int npathkeys = list_length(((Path *) path)->pathkeys);
+ int cmpMultiplier = npathkeys;
+
+ Assert(npathkeys > 0);
/* Mark the path with the correct row estimate */
if (rows)
@@ -512,7 +516,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
logN = LOG2(N);
/* Assumed cost per tuple comparison */
- comparison_cost = 2.0 * cpu_operator_cost;
+ comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
/* Heap creation cost */
startup_cost += comparison_cost * N * logN;
@@ -1896,7 +1900,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
*/
static void
cost_tuplesort(Cost *startup_cost, Cost *run_cost,
- double tuples, int width,
+ double tuples, int width, int cmpMultiplier,
Cost comparison_cost, int sort_mem,
double limit_tuples)
{
@@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
tuples = 2.0;
/* Include the default cost-per-comparison */
- comparison_cost += 2.0 * cpu_operator_cost;
+ comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost;
/* Do we have a useful LIMIT? */
if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2086,7 +2090,9 @@ cost_incremental_sort(Path *path,
* are equal.
*/
cost_tuplesort(&group_startup_cost, &group_run_cost,
- group_tuples, width, comparison_cost, sort_mem,
+ group_tuples, width,
+ list_length(pathkeys),
+ comparison_cost, sort_mem,
limit_tuples);
/*
@@ -2110,7 +2116,7 @@ cost_incremental_sort(Path *path,
* detect the sort groups. This is roughly equal to one extra copy and
* comparison per tuple.
*/
- run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+ run_cost += (cpu_tuple_cost + (list_length(pathkeys) + 1) * comparison_cost) * input_tuples;
/*
* Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2142,7 +2148,7 @@ cost_incremental_sort(Path *path,
*/
void
cost_sort(Path *path, PlannerInfo *root,
- List *pathkeys, int input_disabled_nodes,
+ int npathkeys, int input_disabled_nodes,
Cost input_cost, double tuples, int width,
Cost comparison_cost, int sort_mem,
double limit_tuples)
@@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root,
{
Cost startup_cost;
Cost run_cost;
+ int cmpMultiplier = npathkeys;
+
+ Assert(npathkeys > 0);
cost_tuplesort(&startup_cost, &run_cost,
- tuples, width,
+ tuples, width, cmpMultiplier,
comparison_cost, sort_mem,
limit_tuples);
@@ -2321,7 +2330,7 @@ cost_append(AppendPath *apath)
*/
cost_sort(&sort_path,
NULL, /* doesn't currently need root */
- pathkeys,
+ list_length(pathkeys),
subpath->disabled_nodes,
subpath->total_cost,
subpath->rows,
@@ -2440,6 +2449,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
Cost comparison_cost;
double N;
double logN;
+ int cmpMultiplier = list_length(pathkeys);
+
+ Assert(pathkeys != NIL);
/*
* Avoid log(0)...
@@ -2448,7 +2460,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
logN = LOG2(N);
/* Assumed cost per tuple comparison */
- comparison_cost = 2.0 * cpu_operator_cost;
+ comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
/* Heap creation cost */
startup_cost += comparison_cost * N * logN;
@@ -3708,7 +3720,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
{
cost_sort(&sort_path,
root,
- outersortkeys,
+ list_length(outersortkeys),
outer_path->disabled_nodes,
outer_path->total_cost,
outer_path_rows,
@@ -3758,7 +3770,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
cost_sort(&sort_path,
root,
- innersortkeys,
+ list_length(innersortkeys),
inner_path->disabled_nodes,
inner_path->total_cost,
inner_path_rows,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2ed0d81f6..fbb64e4cf3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5496,7 +5496,7 @@ label_sort_with_costsize(PlannerInfo *root, Sort *plan, double limit_tuples)
Assert(IsA(plan, Sort));
- cost_sort(&sort_path, root, NIL,
+ cost_sort(&sort_path, root, list_length(lefttree->targetlist),
plan->plan.disabled_nodes,
lefttree->total_cost,
lefttree->plan_rows,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e9684..f326e1c34b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6855,7 +6855,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
/* Estimate the cost of seq scan + sort */
seqScanPath = create_seqscan_path(root, rel, NULL, 0);
- cost_sort(&seqScanAndSortPath, root, NIL,
+ cost_sort(&seqScanAndSortPath, root, rel->max_attr,
seqScanPath->disabled_nodes,
seqScanPath->total_cost, rel->tuples, rel->reltarget->width,
comparisonCost, maintenance_work_mem, -1.0);
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a1..0d51e1f612 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1358,7 +1358,7 @@ choose_hashed_setop(PlannerInfo *root, List *groupClauses,
sorted_p.startup_cost = input_path->startup_cost;
sorted_p.total_cost = input_path->total_cost;
/* XXX cost_sort doesn't actually look at pathkeys, so just pass NIL */
- cost_sort(&sorted_p, root, NIL, sorted_p.disabled_nodes,
+ cost_sort(&sorted_p, root, list_length(input_path->pathtarget->exprs), sorted_p.disabled_nodes,
sorted_p.total_cost,
input_path->rows, input_path->pathtarget->width,
0.0, work_mem, -1.0);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..393f993176 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1537,7 +1537,7 @@ create_merge_append_path(PlannerInfo *root,
cost_sort(&sort_path,
root,
- pathkeys,
+ list_length(pathkeys),
subpath->disabled_nodes,
subpath->total_cost,
subpath->rows,
@@ -1866,7 +1866,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
/*
* Estimate cost for sort+unique implementation
*/
- cost_sort(&sort_path, root, NIL,
+ cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs),
subpath->disabled_nodes,
subpath->total_cost,
rel->rows,
@@ -3101,7 +3101,7 @@ create_sort_path(PlannerInfo *root,
pathnode->subpath = subpath;
- cost_sort(&pathnode->path, root, pathkeys,
+ cost_sort(&pathnode->path, root, list_length(pathkeys),
subpath->disabled_nodes,
subpath->total_cost,
subpath->rows,
@@ -3436,7 +3436,7 @@ create_groupingsets_path(PlannerInfo *root,
else
{
/* Account for cost of sort, but don't charge input cost again */
- cost_sort(&sort_path, root, NIL, 0,
+ cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs), 0,
0.0,
subpath->rows,
subpath->pathtarget->width,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 854a782944..b8113b8748 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -108,7 +108,7 @@ extern void cost_resultscan(Path *path, PlannerInfo *root,
RelOptInfo *baserel, ParamPathInfo *param_info);
extern void cost_recursive_union(Path *runion, Path *nrterm, Path *rterm);
extern void cost_sort(Path *path, PlannerInfo *root,
- List *pathkeys, int disabled_nodes,
+ int npathkeys, int disabled_nodes,
Cost input_cost, double tuples, int width,
Cost comparison_cost, int sort_mem,
double limit_tuples);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1e682565d1..7546c795d0 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3031,17 +3031,18 @@ ANALYZE agg_sort_order;
EXPLAIN (COSTS OFF)
SELECT array_agg(c1 ORDER BY c2),c2
FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
- QUERY PLAN
-----------------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------
Sort
Sort Key: c2
-> GroupAggregate
Group Key: c1
- -> Sort
+ -> Incremental Sort
Sort Key: c1, c2
- -> Index Scan using agg_sort_order_c2_idx on agg_sort_order
- Index Cond: (c2 < 100)
-(8 rows)
+ Presorted Key: c1
+ -> Index Scan using agg_sort_order_pkey on agg_sort_order
+ Filter: (c2 < 100)
+(9 rows)
DROP TABLE agg_sort_order CASCADE;
DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 756c2e2496..f543e416f3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5736,18 +5736,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
explain (costs off)
select d.* from d left join (select distinct * from b) s
on d.a = s.id;
- QUERY PLAN
---------------------------------------
- Merge Right Join
- Merge Cond: (b.id = d.a)
- -> Unique
- -> Sort
- Sort Key: b.id, b.c_id
- -> Seq Scan on b
+ QUERY PLAN
+---------------------------------------------
+ Merge Left Join
+ Merge Cond: (d.a = s.id)
-> Sort
Sort Key: d.a
-> Seq Scan on d
-(9 rows)
+ -> Sort
+ Sort Key: s.id
+ -> Subquery Scan on s
+ -> HashAggregate
+ Group Key: b.id, b.c_id
+ -> Seq Scan on b
+(11 rows)
-- join removal is not possible here
explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 108f9ecb44..886dc559a5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1274,9 +1274,11 @@ EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
QUERY PLAN
----------------------------------------------------------------------------
- Sort
+ Incremental Sort
Sort Key: t1.a, t2.b, ((t3.a + t3.b))
- -> Append
+ Presorted Key: t1.a
+ -> Merge Append
+ Sort Key: t1.a
-> Merge Left Join
Merge Cond: (t1_1.a = t2_1.b)
-> Sort
@@ -1325,7 +1327,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
-> Sort
Sort Key: t2_3.b
-> Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
a | c | b | c | ?column? | c
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index c73631a9a1..f1c0af1c8d 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1225,18 +1225,17 @@ SELECT * FROM
SELECT 2 AS t, 4 AS x) ss
WHERE x < 4
ORDER BY x;
- QUERY PLAN
---------------------------------------------------
+ QUERY PLAN
+--------------------------------------------
Sort
Sort Key: (2)
- -> Unique
- -> Sort
- Sort Key: (1), (2)
- -> Append
- -> Result
- -> Result
- One-Time Filter: false
-(9 rows)
+ -> HashAggregate
+ Group Key: (1), (2)
+ -> Append
+ -> Result
+ -> Result
+ One-Time Filter: false
+(8 rows)
SELECT * FROM
(SELECT 1 AS t, 2 AS x
@@ -1290,19 +1289,18 @@ SELECT * FROM
SELECT 2 AS t, 4 AS x) ss
WHERE x > 3
ORDER BY x;
- QUERY PLAN
-------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------
Sort
Sort Key: ss.x
-> Subquery Scan on ss
Filter: (ss.x > 3)
- -> Unique
- -> Sort
- Sort Key: (1), (((random() * '3'::double precision))::integer)
- -> Append
- -> Result
- -> Result
-(10 rows)
+ -> HashAggregate
+ Group Key: (1), (((random() * '3'::double precision))::integer)
+ -> Append
+ -> Result
+ -> Result
+(9 rows)
SELECT * FROM
(SELECT 1 AS t, (random()*3)::int AS x
@@ -1323,24 +1321,22 @@ select distinct q1 from
union all
select distinct * from int8_tbl i82) ss
where q2 = q2;
- QUERY PLAN
-----------------------------------------------------------
- Unique
- -> Merge Append
- Sort Key: "*SELECT* 1".q1
+ QUERY PLAN
+----------------------------------------------------
+ HashAggregate
+ Group Key: "*SELECT* 1".q1
+ -> Append
-> Subquery Scan on "*SELECT* 1"
- -> Unique
- -> Sort
- Sort Key: i81.q1, i81.q2
- -> Seq Scan on int8_tbl i81
- Filter: (q2 IS NOT NULL)
+ -> HashAggregate
+ Group Key: i81.q1, i81.q2
+ -> Seq Scan on int8_tbl i81
+ Filter: (q2 IS NOT NULL)
-> Subquery Scan on "*SELECT* 2"
- -> Unique
- -> Sort
- Sort Key: i82.q1, i82.q2
- -> Seq Scan on int8_tbl i82
- Filter: (q2 IS NOT NULL)
-(15 rows)
+ -> HashAggregate
+ Group Key: i82.q1, i82.q2
+ -> Seq Scan on int8_tbl i82
+ Filter: (q2 IS NOT NULL)
+(13 rows)
select distinct q1 from
(select distinct * from int8_tbl i81
--
2.39.5