remove_useless_groupby_columns is too enthusiastic

Tom Lane Tue, 12 Jul 2022 10:31:35 -0700

I looked into the complaint at [1] about the planner being much
stupider when one side of a JOIN USING is referenced than the other
side.  It seemed to me that that shouldn't be happening, because
the relevant decisions are made on the basis of EquivalenceClasses
and both USING columns should be in the same EquivalenceClass.
I was right about that ... but the damage is being done far upstream
of any EquivalenceClass work.  It turns out that the core of the
issue is that the query looks like


SELECT ... t1 JOIN t2 USING (x)
  GROUP BY x, t2.othercol ORDER BY t2.othercol LIMIT n

In the "okay" case, we resolve "GROUP BY x" as GROUP BY t1.x.
Later on, we are able to realize that ordering by t1.x is
equivalent to ordering by t2.x (because same EquivalenceClass),
and that it's equally good to consider the GROUP BY items in
either order, and that ordering by t2.othercol, t2.x would
allow us to perform the grouping using a GroupAggregate on
data that's already sorted to meet the ORDER BY requirement.
Since there happens to be an index on t2.othercol, t2.x,
we can implement the query with no explicit sort, which wins
big thanks to the small LIMIT.

In the not-okay case, we resolve "GROUP BY x" as GROUP BY t2.x.
Then remove_useless_groupby_columns notices that x is t2's
primary key, so it figures that grouping by t2.othercol is
redundant and throws away that element of GROUP BY.  Now there
is no apparent connection between the GROUP BY and ORDER BY
lists, defeating the optimizations that would lead to a good
choice of plan --- in fact, we conclude early on that that
index's sort ordering is entirely useless to the query :-(

I tried the attached quick-hack patch that just prevents
remove_useless_groupby_columns from removing anything that
appears in ORDER BY.  That successfully fixes the complained-of
case, and it doesn't change any existing regression test results.
I'm not sure whether there are any cases that it makes significantly
worse.

(I also kind of wonder if the fundamental problem here is that
remove_useless_groupby_columns is being done at the wrong time,
and we ought to do it later when we have more semantic info.
But I'm not volunteering to rewrite it completely.)

Anyway, remove_useless_groupby_columns has been there since 9.6
and we've not previously seen reports of cases that it makes worse,
so this seems like a corner-case problem.  Hence I wouldn't risk
back-patching this change.  It seems worth considering for HEAD
though, so I'll stick it in the September CF.

                        regards, tom lane

[1] 
https://www.postgresql.org/message-id/17544-ebd06b00b8836a04%40postgresql.org

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b2716abcc7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2733,12 +2733,15 @@ remove_useless_groupby_columns(PlannerInfo *root)
 
 			/*
 			 * New list must include non-Vars, outer Vars, and anything not
-			 * marked as surplus.
+			 * marked as surplus.  In addition, keep anything that appears in
+			 * the ORDER BY clause, because otherwise we may falsely make it
+			 * look like the GROUP BY and ORDER BY clauses are incompatible.
 			 */
 			if (!IsA(var, Var) ||
 				var->varlevelsup > 0 ||
 				!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
-							   surplusvars[var->varno]))
+							   surplusvars[var->varno]) ||
+				list_member(parse->sortClause, sgc))
 				new_groupby = lappend(new_groupby, sgc);
 		}

remove_useless_groupby_columns is too enthusiastic

Reply via email to