On Wed, 24 Jun 2020 at 21:06, Bruce Momjian <br...@momjian.us> wrote: > I > don't remember anyone complaining about spills to disk during merge > join, so I am unclear why we would need a such control for hash join.
Hash aggregate, you mean? The reason is that upgrading to PG13 can cause a performance regression for an underestimated ndistinct on the GROUP BY clause and cause hash aggregate to spill to disk where it previously did everything in RAM. Sure, that behaviour was never what we wanted to happen, Jeff has fixed that now, but the fact remains that this does happen in the real world quite often and people often get away with it, likey because work_mem is generally set to some very conservative value. Of course, there's also a bunch of people that have been bitten by OOM due to this too. The "neverspill" wouldn't be for those people. Certainly, it's possible that we just tell these people to increase work_mem for this query, that way they can set it to something reasonable and still get spilling if it's really needed to save them from OOM, but the problem there is that it's not very easy to go and set work_mem for a certain query. FWIW, I wish that I wasn't suggesting we do this, but I am because it seems simple enough to implement and it removes a pretty big roadblock that might exist for a small subset of people wishing to upgrade to PG13. It seems lightweight enough to maintain, at least until we invent some better management of how many executor nodes we can have allocating work_mem at once. The suggestion I made was just based on asking myself the following set of questions: Since Hash Aggregate has been able to overflow work_mem since day 1, and now that we've completely changed that fact in PG13, is that likely to upset anyone? If so, should we go to the trouble of giving those people a way of getting the old behaviour back? If we do want to help those people, what's the best way to make those options available to them in a way that we can remove the special options with the least pain in some future version of PostgreSQL? I'd certainly be interested in hearing how other people would answer those question. David