Re: Default setting for enable_hashagg_disk

Peter Geoghegan Mon, 29 Jun 2020 10:20:58 -0700

On Mon, Jun 29, 2020 at 8:07 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> Not sure I follow. Which cases do you mean when you say that 12 could
> safely do them, but 13 won't? I see the following two cases:


> a) Planner in 12 and 13 disagree about whether the hash table will fit
> into work_mem.
>
> I don't quite see why this would be the case (given the same cardinality
> estimates etc.), though. That is, if 12 says "will fit" I'd expect 13 to
> end up with the same conclusion. But maybe 13 has higher per-tuple
> overhead or something? I know we set aside some memory for BufFiles, but
> not when we expect the whole hash table to fit into memory.

I have no reason to believe that the planner is any more or any less
likely to conclude that the hash table will fit in memory in v13 as
things stand (I don't know if the BufFile issue matters).

In general, grouping estimates probably aren't very good compared to
join estimates. I imagine that in either v12 or v13 the planner is
likely to incorrectly believe that it'll all fit in memory fairly
often. v12 was much too permissive about what could happen. But v13 is
too conservative.

> b) Planner believes the hash table will fit, due to underestimate.
>
> On 12, we'd just let the hash table overflow, which may be a win when
> there's enough RAM and the estimate is not "too wrong". But it may
> easily end with a sad OOM.

It might end up with an OOM on v12 due to an underestimate -- but
probably not! The fact that a hash aggregate is faster than a group
aggregate ameliorates the higher memory usage. You might actually use
less memory this way.

> On 13, we'll just start spilling. True - people tend to use conservative
> work_mem values exactly because of cases like this (which is somewhat
> futile as the underestimate may be arbitrarily wrong) and also because
> they don't know how many work_mem instances the query will use.
>
> So yeah, I understand why people may not want to increase work_mem too
> much, and maybe hash_work would be a way to get the "no spill" behavior.

Andres wanted to increase the amount of memory that could be used at
execution time, without changing planning. You could say that hash_mem
is a more ambitious proposal than that. It's changing the behavior
across the board -- though in a way that makes sense anyway. It has
the additional benefit of making it more likely that an in-memory hash
aggregate will be used. That isn't a problem that we're obligated to
solve now, so this may seem odd. But if the more ambitious plan is
actually easier to implement and support, why not pursue it?

hash_mem seems a lot easier to explain and reason about than having
different work_mem budgets during planning and execution, which is
clearly a kludge. hash_mem makes sense generally, and more or less
solves the problems raised on this thread.

-- 
Peter Geoghegan

Re: Default setting for enable_hashagg_disk

Reply via email to