On Tue, Jun 9, 2020 at 7:15 PM Justin Pryzby <pry...@telsasoft.com> wrote:
> On Tue, Jun 09, 2020 at 06:20:13PM -0700, Melanie Plageman wrote: > > On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pg...@j-davis.com> wrote: > > > > > 2. enable_groupingsets_hash_disk (default false): > > > > > > This is about how we choose which grouping sets to hash and which to > > > sort when generating mixed mode paths. > > > > > > Even before this patch, there are quite a few paths that could be > > > generated. It tries to estimate the size of each grouping set's hash > > > table, and then see how many it can fit in work_mem (knapsack), while > > > also taking advantage of any path keys, etc. > > > > > > With Disk-based Hash Aggregation, in principle we can generate paths > > > representing any combination of hashing and sorting for the grouping > > > sets. But that would be overkill (and grow to a huge number of paths if > > > we have more than a handful of grouping sets). So I think the existing > > > planner logic for grouping sets is fine for now. We might come up with > > > a better approach later. > > > > > > But that created a testing problem, because if the planner estimates > > > correctly, no hashed grouping sets will spill, and the spilling code > > > won't be exercised. This GUC makes the planner disregard which grouping > > > sets' hash tables will fit, making it much easier to exercise the > > > spilling code. Is there a better way I should be testing this code > > > path? > > > > So, I was catching up on email and noticed the last email in this > > thread. > > > > I think I am not fully understanding what enable_groupingsets_hash_disk > > does. Is it only for testing? > > If so, it should be in category: "Developer Options". > > > Using the tests you added to src/test/regress/sql/groupingsets.sql, I > > did get a plan that looks like hashagg is spilling to disk (goes through > > hashagg_spill_tuple() code path and has number of batches reported in > > Explain) in a MixedAgg plan for a grouping sets query even with > > enable_groupingsets_hash_disk set to false. > > > I'm not sure if this is more what you were looking for--or maybe I am > > misunderstanding the guc. > > The behavior of the GUC is inconsistent with the other GUCs, which is > confusing. See also Robert's comments in this thread. > https://www.postgresql.org/message-id/20200407223900.GT2228%40telsasoft.com > > The old (pre-13) behavior was: > - work_mem is the amount of RAM to which each query node tries to > constrain > itself, and the planner will reject a plan if it's expected to exceed > that. > ...But a chosen plan might exceed work_mem anyway. > > The new behavior in v13 seems to be: > - HashAgg now respects work_mem, but instead enable*hash_disk are > opportunisitic. A node which is *expected* to spill to disk will be > rejected. > ...But at execution time, a node which exceeds work_mem will be spilled. > > If someone sees a plan which spills to disk and wants to improve > performance by > avoid spilling, they might SET enable_hashagg_disk=off, which might do what > they want (if the plan is rejected at plan time), or it might not, which I > think will be a surprise every time. > > But I thought that the enable_groupingsets_hash_disk GUC allows us to test the following scenario: The following is true: - planner thinks grouping sets' hashtables table would fit in memory (spilling is *not* expected) - user is okay with spilling - some grouping keys happen to be sortable and some hashable The following happens: - Planner generates some HashAgg grouping sets paths - A MixedAgg plan is created - During execution of the MixedAgg plan, one or more grouping sets' hashtables would exceed work_mem, so the executor spills those tuples to disk instead of exceeding work_mem Especially given the code and comment: /* * If we have sortable columns to work with (gd->rollups is non-empty) * and enable_groupingsets_hash_disk is disabled, don't generate * hash-based paths that will exceed work_mem. */ if (!enable_groupingsets_hash_disk && hashsize > work_mem * 1024L && gd->rollups) return; /* nope, won't fit */ If this is the scenario that the GUC is designed to test, it seems like you could exercise it without the enable_groupingsets_hash_disk GUC by lying about the stats, no? -- Melanie Plageman