Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

Priya V Tue, 05 Aug 2025 10:01:58 -0700

Hello Postgres community,

We operate a large PostgreSQL fleet (~15,000 databases) on dedicated Linux
hosts.
Each host runs *multiple PostgreSQL instances* (multi-instance setup, not
just multiple DBs inside one instance).


*Environment:*

   -

   *PostgreSQL Versions:* Mix of 13.13 and 15.12 (upgrades in progress to
   be at 15.12 currently both are actively in use)
   -

   *OS / Kernel:* RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range
   -

   *RAM:* 256 GiB (varies slightly)
   -

   *Swap:* Currently none
   -

   *Workload:* Highly mixed — OLTP-style internal apps with unpredictable
   query patterns and connection counts
   -

   *Goal:* Uniform, safe memory settings across the fleet to avoid kernel
   or database instability

We’re reviewing vm.overcommit_* settings because we’ve seen conflicting
guidance:

   -

   vm.overcommit_memory = 2 gives predictability but can reject allocations
   early
   -

   vm.overcommit_memory = 1 is more flexible but risks OOM kills if many
   backends hit peak memory usage at once

We’re considering:

   -

   *vm.overcommit_memory = 2* for strict accounting
   -

   Increasing vm.overcommit_ratio from 50 → 80 or 90 to better reflect
   actual PostgreSQL usage (e.g., work_mem reservations that aren’t fully
   used)

*Our questions for those running large PostgreSQL fleets:*

   1.

   What overcommit_ratio do you find safe for PostgreSQL without causing
   kernel memory crunches?
   2.

   Do you prefer overcommit_memory = 1 or = 2 for production stability?
   3.

   How much swap (if any) do you keep in large-memory servers where
   PostgreSQL is the primary workload? Is having swap configured a good idea
   or not ?
   4.

   Any real-world cases where kernel accounting was too strict or too loose
   for PostgreSQL?
   5. What settings to go with if we are not planning on using swap ?

We’d like to avoid both extremes:

   -

   Too low a ratio → PostgreSQL backends failing allocations even with free
   RAM
   -

   Too high a ratio → OOM killer terminating PostgreSQL under load spikes

Any operational experiences, tuning recommendations, or kernel/PG
interaction pitfalls would be very helpful.

TIA

Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet

Reply via email to