On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost <sfr...@snowman.net> wrote: > I didn't, and don't, think it particularly relevant to the discussion, > but if you don't like the comparison to Sort then we could compare it to > a HashJoin instead- the point is that, yes, if you are willing to give > more memory to a given operation, it's going to go faster, and the way > users tell us that they'd like the query to use more memory is already > well-defined and understood to be through work_mem. We do not do them a > service by ignoring that.
The hash_mem design (as it stands) would affect both hash join and hash aggregate. I believe that it makes most sense to have hash-based nodes under the control of a single GUC. I believe that this granularity will cause the least problems. It certainly represents a trade-off. work_mem is less of a problem with hash join, primarily because join estimates are usually a lot better than grouping estimates. But it is nevertheless something that it makes sense to put in the same conceptual bucket as hash aggregate, pending a future root and branch redesign of work_mem. > > This is a straw man. > > It's really not- the system has been quite intentionally designed, and > documented, to work within the constraints given to it (even if they're > not very well defined, this is still the case) and this particular node > didn't. That isn't a feature. I don't think that it was actually designed, so much as it evolved -- at least in this particular respect. But it hardly matters now. > We already have a GUC that we've documented and explained to users that > is there specifically to control this exact thing, and that's work_mem. > How would we document this? hash_mem would probably work as a multiplier of work_mem when negated, or as an absolute KB value, like work_mem. It would apply to nodes that use hashing, currently defined as hash agg and hash join. We might make the default -2, meaning twice whatever work_mem was (David Johnson suggested 4x just now, which seems a little on the aggressive side to me). Yes, that is a new burden for users that need to tune work_mem. Similar settings exist in other DB systems (or did before they finally replaced the equivalent of work_mem with something fundamentally better). All of the choices on the table have significant downsides. Nobody can claim the mantle of prudent conservative by proposing that we do nothing here. To do so is to ignore predictable significant negative consequences for our users. That much isn't really in question. I'm pretty sure that Andres, Robert, David Rowley, Alvaro, Justin, and Tomas will all agree with that statement (I'm sure that I'm forgetting somebody else, though). If this seems strange or unlikely, then look back over the thread. > Where's the setting for HashJoin or for Sort, to do the same thing? > Would we consider it sensible to set everything to "use as much memory > as you want?" I disagree with this notion that HashAgg is so very > special that it must have an independent set of tunables like this. Regardless of what we do now, the fact is that the economic case for giving hash agg more memory (relative to most other executor nodes) when the system as a whole is short on memory is very strong. It does not follow that the current hash_mem proposal is the best way forward now, of course, but I don't see why you don't at least agree with me about that much. It seems rather obvious to me. > The old behavior was buggy and we are providing quite enough continuity > through the fact that we've got major versions which will be maintained > for the next 5 years that folks can run as they test out newer versions. > Inventing hacks to preserve bug-compatibility across major versions is > not a good direction to go in. Like I said, the escape hatch GUC is not my preferred solution. But at least it acknowledges the problem. I don't think that anyone (or anyone else) believes that work_mem doesn't have serious limitations. > We have a parameter which already drives this and which users are > welcome to (and quite often do) tune. I disagree that anything further > is either essential or particularly desirable. This is a user hostile attitude. > I'm really rather astounded at the direction this has been going in. Why? -- Peter Geoghegan