On 4/12/24 06:44, Tom Lane wrote:
* It seems like root->processed_groupClause no longer has much to do
with the grouping behavior, which is scary given how much code still
believes that it does. I suspect there are bugs lurking there, and
1. Analysing the code, processed_groupClause is used in:
- adjust_foreign_grouping_path_cost - to decide, do we need to add cost
of sort to the foreign grouping.
- just for replacing relids during SJE process.
- create_groupingsets_plan(), preprocess_grouping_sets,
remove_useless_groupby_columns - we don't apply this feature in the case
of grouping sets.
- get_number_of_groups - estimation shouldn't depend on the order of
clauses.
- create_grouping_paths - to analyse hashability of grouping clause
- make_group_input_target, make_partial_grouping_target - doesn't depend
on the order of clauses
planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in
the case of hash grouping.
So, the only case we can impact current behavior lies in the
postgres_fdw. But here we don't really know which plan will be chosen
during planning on foreign node and stay the same behavior is legal for me.
am not comforted by the fact that the patch changed exactly nothing
in the pathnodes.h documentation of that field. This comment looks
pretty silly now too:
/* Preprocess regular GROUP BY clause, if any */
root->processed_groupClause = list_copy(parse->groupClause);
What "preprocessing" is going on there? This comment was adequate
before, when the code line invoked preprocess_groupclause which had
a bunch of commentary; but now it has to stand on its own and it's
not doing a great job of that.
Thanks for picking this place. I fixed it.
More interesting here is that we move decisions on the order of group-by
clauses to the stage, where we already have underlying subtree and
incoming path keys. In principle, we could return the preprocessing
routine and arrange GROUP-BY with the ORDER-BY clause as it was before.
But looking into the code, I found only one reason to do this:
adjust_group_pathkeys_for_groupagg. Curious about how much profit we get
here, I think we can discover it later with no hurry. A good outcome
here will be a test case that can show the importance of arranging
GROUP-BY and ORDER-BY at an early stage.
--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5ea3705796..861656a192 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1424,7 +1424,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction,
}
else if (parse->groupClause)
{
- /* Preprocess regular GROUP BY clause, if any */
+ /*
+ * Make a copy of origin groupClause because we are going to
+ * remove redundant clauses.
+ */
root->processed_groupClause = list_copy(parse->groupClause);
/* Remove any redundant GROUP BY columns */
remove_useless_groupby_columns(root);