Hello Postgres community, We operate a large PostgreSQL fleet (~15,000 databases) on dedicated Linux hosts. Each host runs *multiple PostgreSQL instances* (multi-instance setup, not just multiple DBs inside one instance).
*Environment:* - *PostgreSQL Versions:* Mix of 13.13 and 15.12 (upgrades in progress to be at 15.12 currently both are actively in use) - *OS / Kernel:* RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range - *RAM:* 256 GiB (varies slightly) - *Swap:* Currently none - *Workload:* Highly mixed — OLTP-style internal apps with unpredictable query patterns and connection counts - *Goal:* Uniform, safe memory settings across the fleet to avoid kernel or database instability We’re reviewing vm.overcommit_* settings because we’ve seen conflicting guidance: - vm.overcommit_memory = 2 gives predictability but can reject allocations early - vm.overcommit_memory = 1 is more flexible but risks OOM kills if many backends hit peak memory usage at once We’re considering: - *vm.overcommit_memory = 2* for strict accounting - Increasing vm.overcommit_ratio from 50 → 80 or 90 to better reflect actual PostgreSQL usage (e.g., work_mem reservations that aren’t fully used) *Our questions for those running large PostgreSQL fleets:* 1. What overcommit_ratio do you find safe for PostgreSQL without causing kernel memory crunches? 2. Do you prefer overcommit_memory = 1 or = 2 for production stability? 3. How much swap (if any) do you keep in large-memory servers where PostgreSQL is the primary workload? Is having swap configured a good idea or not ? 4. Any real-world cases where kernel accounting was too strict or too loose for PostgreSQL? 5. What settings to go with if we are not planning on using swap ? We’d like to avoid both extremes: - Too low a ratio → PostgreSQL backends failing allocations even with free RAM - Too high a ratio → OOM killer terminating PostgreSQL under load spikes Any operational experiences, tuning recommendations, or kernel/PG interaction pitfalls would be very helpful. TIA