On Mon, Jun 29, 2020 at 8:07 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > Not sure I follow. Which cases do you mean when you say that 12 could > safely do them, but 13 won't? I see the following two cases:
> a) Planner in 12 and 13 disagree about whether the hash table will fit > into work_mem. > > I don't quite see why this would be the case (given the same cardinality > estimates etc.), though. That is, if 12 says "will fit" I'd expect 13 to > end up with the same conclusion. But maybe 13 has higher per-tuple > overhead or something? I know we set aside some memory for BufFiles, but > not when we expect the whole hash table to fit into memory. I have no reason to believe that the planner is any more or any less likely to conclude that the hash table will fit in memory in v13 as things stand (I don't know if the BufFile issue matters). In general, grouping estimates probably aren't very good compared to join estimates. I imagine that in either v12 or v13 the planner is likely to incorrectly believe that it'll all fit in memory fairly often. v12 was much too permissive about what could happen. But v13 is too conservative. > b) Planner believes the hash table will fit, due to underestimate. > > On 12, we'd just let the hash table overflow, which may be a win when > there's enough RAM and the estimate is not "too wrong". But it may > easily end with a sad OOM. It might end up with an OOM on v12 due to an underestimate -- but probably not! The fact that a hash aggregate is faster than a group aggregate ameliorates the higher memory usage. You might actually use less memory this way. > On 13, we'll just start spilling. True - people tend to use conservative > work_mem values exactly because of cases like this (which is somewhat > futile as the underestimate may be arbitrarily wrong) and also because > they don't know how many work_mem instances the query will use. > > So yeah, I understand why people may not want to increase work_mem too > much, and maybe hash_work would be a way to get the "no spill" behavior. Andres wanted to increase the amount of memory that could be used at execution time, without changing planning. You could say that hash_mem is a more ambitious proposal than that. It's changing the behavior across the board -- though in a way that makes sense anyway. It has the additional benefit of making it more likely that an in-memory hash aggregate will be used. That isn't a problem that we're obligated to solve now, so this may seem odd. But if the more ambitious plan is actually easier to implement and support, why not pursue it? hash_mem seems a lot easier to explain and reason about than having different work_mem budgets during planning and execution, which is clearly a kludge. hash_mem makes sense generally, and more or less solves the problems raised on this thread. -- Peter Geoghegan