Hi, On 2020-03-24 18:12:03 -0700, Jeff Davis wrote: > Attached is a small patch that introduces a simple function, > AllocSetEstimateChunkSpace(), and uses it to improve the estimate for > the size of a hash entry for hash aggregation. > > Getting reasonable estimates for the hash entry size (including the > TupleHashEntryData, the group key tuple, the per-group state, and by- > ref transition values) is important for multiple reasons: > > * It helps the planner know when hash aggregation is likely to spill, > and how to cost it. > > * It helps hash aggregation regulate itself by setting a group limit > (separate from the memory limit) to account for growing by-ref > transition values. > > * It helps choose a good initial size for the hash table. Too large of > a hash table will crowd out space that could be used for the group keys > or the per-group state. > > Each group requires up to three palloc chunks: one for the group key > tuple, one for the per-group states, and one for a by-ref transition > value. Each chunk can have a lot of overhead: in addition to the chunk > header (16 bytes overhead), it also rounds the value up to a power of > two (~25% overhead). So, it's important to account for this chunk > overhead.
As mention on im/call: I think this is mainly an argument for combining the group tuple & per-group state allocations. I'm kind of fine with afterwards taking the allocator overhead into account. I still don't buy that its useful to estimate the by-ref transition value overhead. We just don't have anything even have close enough to a meaningful value to base this on. Even if we want to consider the initial transition value or something, we'd be better off initially over-estimating the size of the transition state by a lot more than 25% (I am thinking more like 4x or so, with a minumum of 128 bytes or so). Since this is about the initial size of the hashtable, we're better off with a too small table, imo. A "too large" table is more likely to end up needing to spill when filled to only a small degree. I am kind of wondering if there's actually much point in trying to be accurate here at all. Especially when doing this from the planner. Since, for a large fraction of aggregates, we're going to very roughly guess at transition space anyway, it seems like a bunch of "overhead factors" could turn out to be better than trying to be accurate on some parts, while still widely guessing at transition space. But I'm not sure. > I considered making it a method of a memory context, but the planner > will call this function before the hash agg memory context is created. > It seems to make more sense for this to just be an AllocSet-specific > function for now. -1 to this approach. I think it's architecturally the wrong direction to add more direct calls to functions within specific contexts. On 2020-03-25 11:46:31 -0700, Jeff Davis wrote: > On Tue, 2020-03-24 at 18:12 -0700, Jeff Davis wrote: > > I considered making it a method of a memory context, but the planner > > will call this function before the hash agg memory context is > > created. > > I implemented this approach also; attached. > > It works a little better by having access to the memory context, so it > knows set->allocChunkLimit. It also allows it to work with the slab > allocator, which needs the memory context to return a useful number. > However, this introduces more code and awkwardly needs to use > CurrentMemoryContext when called from the planner (because the actual > memory context we're try to estimate for doesn't exist yet). Yea, the "context needs to exist" part sucks. I really don't want to add calls directly into AllocSet from more places though. And just ignoring the parameters to the context seems wrong too. Greetings, Andres Freund