Re: Trouble with hashagg spill I/O pattern and costing

Tomas Vondra Thu, 28 May 2020 11:58:15 -0700

On Wed, May 27, 2020 at 11:07:04AM +0200, Tomas Vondra wrote:

On Tue, May 26, 2020 at 06:42:50PM -0700, Melanie Plageman wrote:

On Tue, May 26, 2020 at 5:40 PM Jeff Davis <[email protected]> wrote:

On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote:


As for the tlist fix, I think that's mostly ready too - the one thing
we
should do is probably only doing it for AGG_HASHED. For AGG_SORTED
it's
not really necessary.


Melanie previously posted a patch to avoid spilling unneeded columns,
but it introduced more code:



https://www.postgresql.org/message-id/caakru_aefesv+ukqwqu+ioenoil2lju9diuh9br8mbyxuz0...@mail.gmail.com

and it seems that Heikki also looked at it. Perhaps we should get an
acknowledgement from one of them that your one-line change is the right
approach?

I spent some time looking at it today, and, it turns out I was wrong.

I thought that there was a case I had found where CP_SMALL_TLIST did not
eliminate as many columns as could be eliminated for the purposes of
spilling, but, that turned out not to be the case.

I changed CP_LABEL_TLIST to CP_SMALL_TLIST in
create_groupingsets_plan(), create_agg_plan(), etc and tried a bunch of
different queries and this 2-3 line change worked for all the cases I
tried. Is that where you made the change?


I've only made the change in create_agg_plan, because that's what was in
the query plan I was investigating. You may be right that the same fix
is needed in additional places, though.


Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and
create_groupingsets_plan. I've looked at the other places that I think
seem like they might benefit from it (create_upper_unique_plan,
create_group_plan) but I think we don't need to modify those - there'll
either be a Sort of HashAgg, which will take care about the projection.

Or do you think some other places need CP_SMALL_TLIST?

And then are you proposing to set it based on the aggstrategy to either
CP_LABEL_TLIST or CP_SMALL_TLIST here?


Yes, something like that. The patch I shared on on 5/21 just changed
that, but I'm wondering if that could add overhead for sorted
aggregation, which already does the projection thanks to the sort.


I ended up tweaking the tlist only for AGG_MIXED and AGG_HASHED. We
clearly don't need it for AGG_PLAIN or AGG_SORTED. This way we don't
break regression tests by adding unnecessary "Subquery Scan" nodes.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/optimizer/plan/createplan.c 
b/src/backend/optimizer/plan/createplan.c
index 9941dfe65e..ddb277cdf0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2113,12 +2113,22 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
        Plan       *subplan;
        List       *tlist;
        List       *quals;
+       int                     flags;
 
        /*
         * Agg can project, so no need to be terribly picky about child tlist, 
but
-        * we do need grouping columns to be available
+        * we do need grouping columns to be available. We are a bit more 
careful
+        * with hash aggregate, where we explicitly request small tlist to 
minimize
+        * I/O needed for spilling (possibly not known already).
         */
-       subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);
+       flags = CP_LABEL_TLIST;
+
+       /* ensure small tlist for hash aggregate */
+       if (best_path->aggstrategy == AGG_HASHED ||
+               best_path->aggstrategy == AGG_MIXED)
+               flags |= CP_SMALL_TLIST;
+
+       subplan = create_plan_recurse(root, best_path->subpath, flags);
 
        tlist = build_path_tlist(root, &best_path->path);
 
@@ -2209,7 +2219,8 @@ create_groupingsets_plan(PlannerInfo *root, 
GroupingSetsPath *best_path)
         * Agg can project, so no need to be terribly picky about child tlist, 
but
         * we do need grouping columns to be available
         */
-       subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);
+       subplan = create_plan_recurse(root, best_path->subpath,
+                                                                 
CP_LABEL_TLIST | CP_SMALL_TLIST);
 
        /*
         * Compute the mapping from tleSortGroupRef to column index in the 
child's

Re: Trouble with hashagg spill I/O pattern and costing

Reply via email to