On Tue, Feb 28, 2023 at 07:06:43PM +0100, Jehan-Guillaume de Rorthais wrote: > Hello all, > > A customer is facing out of memory query which looks similar to this > situation: > > > https://www.postgresql.org/message-id/flat/12064.1555298699%40sss.pgh.pa.us#eb519865575bbc549007878a5fb7219b > > This PostgreSQL version is 11.18. Some settings:
hash joins could exceed work_mem until v13: |Allow hash aggregation to use disk storage for large aggregation result |sets (Jeff Davis) | |Previously, hash aggregation was avoided if it was expected to use more |than work_mem memory. Now, a hash aggregation plan can be chosen despite |that. The hash table will be spilled to disk if it exceeds work_mem |times hash_mem_multiplier. | |This behavior is normally preferable to the old behavior, in which once |hash aggregation had been chosen, the hash table would be kept in memory |no matter how large it got — which could be very large if the planner |had misestimated. If necessary, behavior similar to that can be obtained |by increasing hash_mem_multiplier. > https://explain.depesz.com/s/sGOH This shows multiple plan nodes underestimating the row counts by factors of ~50,000, which could lead to the issue fixed in v13. I think you should try to improve the estimates, which might improve other queries in addition to this one, in addition to maybe avoiding the issue with joins. > The customer is aware he should rewrite this query to optimize it, but it's a > long time process he can not start immediately. To make it run in the > meantime, > he actually removed the top CTE to a dedicated table. Is the table analyzed ? > Is it usual a backend is requesting such large memory size (13 GB) and > actually use less of 60% of it (7.7GB of RSS)? It's possible it's "using less" simply because it's not available. Is the process swapping ? -- Justin