On Dec 28, 2024, at 12:26 PM, Jeremy Schneider <schnei...@ardentperf.com> wrote: > > While I don't have a detailed design in mind, I'd like to add a strong > +1 on the general idea that work_mem is hard to effectively use because > queries can vary so widely in how many nodes might need work memory. > > I'd almost like to have two limits: > > First, a hard per-connection limit which could be set very high - we > can track total memory usage across contexts inside of palloc and pfree > (and maybe this could also be exposed in pg_stat_activity for easy > visibility into a snapshot of memory use across all backends). If > palloc detects that an allocation would take the total over the hard > limit, then you just fail the palloc with an OOM. This protects > postgres from a memory leak in a single backend OOM'ing the whole > system and restarting the whole database; failing a single connection > is better than failing all of them. > > Second, a soft per-connection "total_connection_work_mem_target" which > could be set lower. The planner can just look at the total number of > nodes that it expects to allocate work memory, divide the target by > this and then set the work_mem for that query. There should be a > reasonable floor (minimum) for work_mem - maybe the value of work_mem > itself becomes this and the new target doesn't do anything besides > increasing runtime work_mem. > > Maybe even could do a "total_instance_work_mem_target" where it's > divided by the number of average active connections or something.
IMHO none of this will be very sane until we actually have cluster-level limits. One sudden burst in active connections and you still OOM the instance. And while we could have such a mechanism do something clever like dynamically lowering every sessions query_mem/work_mem/whatever, ultimately I think it would also need the ability to deny or delay sessions from starting new transactions.