Re: POC: GROUP BY optimization

Andrei Lepikhov Wed, 17 Apr 2024 22:01:45 -0700

On 4/12/24 06:44, Tom Lane wrote:

* It seems like root->processed_groupClause no longer has much to do
with the grouping behavior, which is scary given how much code still
believes that it does.  I suspect there are bugs lurking there, and

1. Analysing the code, processed_groupClause is used in:

- adjust_foreign_grouping_path_cost - to decide, do we need to add costof sort to the foreign grouping.

- just for replacing relids during SJE process.

- create_groupingsets_plan(), preprocess_grouping_sets,remove_useless_groupby_columns - we don't apply this feature in the caseof grouping sets.- get_number_of_groups - estimation shouldn't depend on the order ofclauses.

- create_grouping_paths - to analyse hashability of grouping clause

- make_group_input_target, make_partial_grouping_target - doesn't dependon the order of clausesplanner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - inthe case of hash grouping.

So, the only case we can impact current behavior lies in thepostgres_fdw. But here we don't really know which plan will be chosenduring planning on foreign node and stay the same behavior is legal for me.

am not comforted by the fact that the patch changed exactly nothing
in the pathnodes.h documentation of that field.  This comment looks
pretty silly now too:

             /* Preprocess regular GROUP BY clause, if any */
             root->processed_groupClause = list_copy(parse->groupClause);

What "preprocessing" is going on there?  This comment was adequate
before, when the code line invoked preprocess_groupclause which had
a bunch of commentary; but now it has to stand on its own and it's
not doing a great job of that.

Thanks for picking this place. I fixed it.

More interesting here is that we move decisions on the order of group-byclauses to the stage, where we already have underlying subtree andincoming path keys. In principle, we could return the preprocessingroutine and arrange GROUP-BY with the ORDER-BY clause as it was before.But looking into the code, I found only one reason to do this:adjust_group_pathkeys_for_groupagg. Curious about how much profit we gethere, I think we can discover it later with no hurry. A good outcomehere will be a test case that can show the importance of arrangingGROUP-BY and ORDER-BY at an early stage.


--
regards,
Andrei Lepikhov
Postgres Professional

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5ea3705796..861656a192 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1424,7 +1424,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction,
 		}
 		else if (parse->groupClause)
 		{
-			/* Preprocess regular GROUP BY clause, if any */
+			/*
+			 * Make a copy of origin groupClause because we are going to
+			 * remove redundant clauses.
+			 */
 			root->processed_groupClause = list_copy(parse->groupClause);
 			/* Remove any redundant GROUP BY columns */
 			remove_useless_groupby_columns(root);

Re: POC: GROUP BY optimization

Reply via email to