On Tue, 2020-06-09 at 21:15 -0500, Justin Pryzby wrote: > The behavior of the GUC is inconsistent with the other GUCs, which is > confusing. See also Robert's comments in this thread. > https://www.postgresql.org/message-id/20200407223900.GT2228%40telsasoft.com
enable_* GUCs are planner GUCs, so it would be confusing to me if they affected execution-time behavior. I think the point of confusion is that it's not enabling/disabling an entire execution node; it only "disables" HashAgg if it thinks it will spill. I agree that is a difference with the other GUCs, and could cause confusion. Stepping back, I was trying to solve two problems with these GUCs: 1. Testing the spilling of hashed grouping sets: I'm inclined to just get rid of enable_groupingsets_hash_disk and use Melanie's stats- hacking approach instead. 2. Trying to provide an escape hatch for someone who experiences a performance regression and wants something like the old behavior back. There are two aspects of the old behavior that a user could potentially want back: a. Don't choose HashAgg if it's expected to have more groups than fit into a work_mem-sized hashtable. b. If executing HashAgg, and the hash table exceeds work_mem, just keep going. The behavior in v13 master is, by default, analagous to Sort or anything else that adapts at runtime to spill. If we had spillable HashAgg the whole time, we wouldn't be worried about #2 at all. But, out of conservatism, I am trying to accommodate users who want an escape hatch, at least for a release or two until users feel more comfortable with disk-based HashAgg. Setting enable_hash_disk=false implements 2(a). This name apparently causes confusion, but it's hard to come up with a better one because the v12 behavior has nuance that's hard to express succinctly. I don't think the names you suggested quite fit, but the idea to use a more interesting GUC value might help express the behavior. Perhaps making enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word "reject" is too definite for the planner, which is working with imperfect information. In master, there is no explicit way to get 2(b), but you can just set work_mem higher in a lot of cases. If enough people want 2(b), I can add it easily. Perhaps hashagg_overflow=on|off, which would control execution time behavior? Regards, Jeff Davis