Hello, po 18. 7. 2022 v 21:04 odesílatel Tom Lane <t...@sss.pgh.pa.us> napsal:
> =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?= <zeleny.a...@gmail.com> writes: > > postgres=# show shared_buffers ; > > shared_buffers > > ---------------- > > 10GB > > (1 row) > > Oh! In that case, there is absolutely nothing to see here. > This chunk: > > > 00007fd9b0551000 10827040K rw-s- zero (deleted) > > must be your shared-memory region, and it's accounting for just > about all of the process's memory space. There is no reason > to suspect a leak. > I agree this 10 GB is expected to be accounted for shared buffers (just a size correlation) and therefore no memory leak here. > I think you've been fooled by the fact that "top" and some other > tools frequently don't report a shared-memory page as part of a > process's usage until that process touches that particular page. > Thus, the reported memory usage of a Postgres backend will grow > over time as it randomly happens to touch different buffers within > the shared buffer arena. That does not constitute a leak, but > it does make such tools next door to useless for detecting > actual leaks :-(. You can only believe there's a leak if the > reported usage doesn't level off after reaching the vicinity of > your shared memory size. > On the other hand, if we do have 10GB of shared buffers plus ~14MB reported by Top Memory Context (even if /proc/<pid>/status reports RssAnon memory when touched, the process VM peak is ~12 GB? -bash-4.2$ cat 20220718_200230.status.17048.log Name: postmaster Umask: 0077 State: S (sleeping) Tgid: 17048 Ngid: 0 Pid: 17048 PPid: 4476 TracerPid: 0 Uid: 26 26 26 26 Gid: 26 26 26 26 FDSize: 1024 Groups: 26 VmPeak: 12477752 kB VmSize: 12439740 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 9811560 kB VmRSS: 9811560 kB RssAnon: 1283964 kB RssFile: 15784 kB RssShmem: 8511812 kB VmData: 1282860 kB VmStk: 132 kB VmExe: 7816 kB VmLib: 85312 kB VmPTE: 23716 kB VmSwap: 0 kB Threads: 1 SigQ: 0/160197 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000400000 SigIgn: 0000000001701800 SigCgt: 00000001a0006287 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 0 Speculation_Store_Bypass: thread vulnerable Cpus_allowed: ffffff Cpus_allowed_list: 0-23 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 5273344 nonvoluntary_ctxt_switches: 13651 I'd expect VMPeak size close to 10GB (suppose that all shared memory pages were touched) shared_buffers plus ~14MB of private memory, but VMPeak: 12477752 [kB] - 10485760 shared buffers [kB] = 1991992 kB Which is a bit more than reported RssAnon: 1283964 kB memory for the process, but very close. pmap taken in the same time: -bash-4.2$ cat 20220718_200230.pmap.17048.log | grep -e anon -e deleted 0000000000db3000 200K rw--- [ anon ] 0000000002d15000 504K rw--- [ anon ] 0000000002d93000 934476K rw--- [ anon ] <<<= this is suspected to be responsible for the OutOfMemory -> failing fork(), reinitialize the cluser 00007fd989776000 311304K rw--- [ anon ] <<<= togehter with this one 00007fd9a8d75000 32772K rw--- [ anon ] 00007fd9acb65000 20K rw--- [ anon ] 00007fd9affc1000 372K rw--- [ anon ] 00007fd9b023d000 1028K rw--- [ anon ] 00007fd9b054b000 24K rw--- [ anon ] 00007fd9b0551000 10827040K rw-s- zero (deleted) <<<= expected to be shared_buffers 00007fdc45f18000 16K rw--- [ anon ] 00007fdc4d50c000 184K rw--- [ anon ] 00007fdc4de16000 4K rw--- [ anon ] 00007fdc4e03c000 8K rw--- [ anon ] 00007fdc4fd1e000 84K rw--- [ anon ] 00007fdc4ff6f000 8K rw--- [ anon ] 00007fdc508d7000 8K rw--- [ anon ] 00007fdc50d59000 4K rw--- [ anon ] 00007fdc5139e000 8K rw--- [ anon ] 00007fdc520f3000 40K rw--- [ anon ] 00007fdc52902000 20K rw--- [ anon ] 00007fdc52ead000 16K rw--- [ anon ] 00007fdc532af000 4K rw--- [ anon ] 00007fdc53e2b000 16K rw--- [ anon ] 00007fdc544db000 16K rw--- [ anon ] 00007fdc54cc8000 8K rw--- [ anon ] 00007fdc54f1a000 1588K rw--- [ anon ] 00007fdc550dc000 104K rw--- [ anon ] 00007fdc550fc000 8K r-x-- [ anon ] 00007fdc550ff000 4K rw--- [ anon ] 00007fdc55102000 4K rw--- [ anon ] 00007fffd38f0000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] The machine has 40GB of RAM. It is a dedicated DB server, only a single PG instance is running there. Since we again run out of memory from my last reply, now it looks healthy: -bash-4.2$ psql -c "select count(*) from pg_stat_activity" count ------- 336 (1 row) $-bash-4.2$ sudo smem -u User Count Swap USS PSS RSS chrony 1 0 512 515 872 dbus 1 0 1064 1074 1432 postfix 2 0 2752 3122 5540 polkitd 1 0 9152 9276 9816 postgres-exporter 1 0 11912 11912 11916 zelenya 9 0 34320 35550 43648 sql_exporter 1 0 46332 46332 46340 root 28 0 267920 274273 298388 postgres 336 0 5904884 16399946 143527180 -bash-4.2$ free -h total used free shared buff/cache available Mem: 39G 7,3G 1,2G 10G 30G 21G Swap: 0B 0B 0B The processes suspect to be responsible for memory exhaustion (using the RssAnon memory from /proc/<pid>/status files always belong to a single user: -bash-4.2$ psql -c "select count(*) from pg_stat_activity where usename='app_feed'" count ------- 67 (1 row) So the available 21GB (under normal conditions used by the kernel as file system page cache) is not enough if 20 of the processes allocate 1GB private memory -> OS CommitLimit is reached and OutOfMemory error happens. So back to why you're getting these out-of-memory failures: > we still don't have much info about that. I wonder whether > there is something watching the total usage reported by "top" > and taking action based on that entirely-misleading number. > The kernel itself should be aware that there's no extra memory > pressure from N backends all using the same shared memory > segment, but it can be hard for outside tools to know that. > There is no tool taking action when high memory usage is detected, OS kernel provides all the memory requested by processes until it happens, that a fork() fails (usually when autovacuum launcher attempts to spawn a new autovacuum worker as these are most frequently started new processes. But the failure comes from the failed fork() due to a lack of memory for new process creation. 2022-07-01 06:51:02 CEST [4476]: [26-1] user=,db=,host=,app= LOG: could not fork autovacuum worker process: Cannot allocate memory 2022-07-01 06:51:03 CEST [4476]: [27-1] user=,db=,host=,app= LOG: could not fork worker process: Cannot allocate memory 2022-07-01 06:51:03 CEST [4476]: [28-1] user=,db=,host=,app= LOG: could not fork worker process: Cannot allocate memory 2022-07-01 06:51:03 CEST [18882]: [1332-1] user=app_feed,db=prematch,host=172.25.0.231,app=[unknown] ERROR: could not resize shared memory segment "/PostgreSQL.1824635724" to 204064 bytes: No space left on device So OS kernel was short of available memory. Memory Commit limit on the system: -bash-4.2$ sysctl -q vm.overcommit_memory vm.overcommit_memory = 2 -bash-4.2$ sysctl -q vm.overcommit_ratio vm.overcommit_ratio = 95 swap is disabled: -bash-4.2$ swapon -s -bash-4.2$ CommitLimit = RAM * overcommit_ratio / 100 + swap ~ 37GB (ignoring total_huge_TLB as huge pages are not used) as can be seen in meminfo: -bash-4.2$ cat /proc/meminfo MemTotal: 41033756 kB MemFree: 1549608 kB MemAvailable: 24782516 kB Buffers: 8468 kB Cached: 33667252 kB SwapCached: 0 kB Active: 24600744 kB Inactive: 12524428 kB Active(anon): 14131536 kB Inactive(anon): 130116 kB Active(file): 10469208 kB Inactive(file): 12394312 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 1876 kB Writeback: 16 kB AnonPages: 3449376 kB Mapped: 10796800 kB Shmem: 10811896 kB Slab: 987832 kB SReclaimable: 813916 kB SUnreclaim: 173916 kB KernelStack: 13648 kB PageTables: 823236 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 38982068 kB Committed_AS: 19261792 kB VmallocTotal: 34359738367 kB VmallocUsed: 256724 kB VmallocChunk: 34338043900 kB Percpu: 5632 kB HardwareCorrupted: 0 kB AnonHugePages: 65536 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 110464 kB DirectMap2M: 6180864 kB DirectMap1G: 37748736 kB >From the node-exporter metric, I can see that the component that grows over time until the commit limit is reached and the cluster is reinitialized is "Active(anon)". Diving into /proc/<pid>/status I can see multiple processes of the app_feed DB user RssAnon memory growth over time and the sum of the increments matches the "Active(anon)" growth seen in the global view in /proc/meminfo. This observed growth together with the anonymous memory segments (on top of shared buffers mentioned above) was the reason why after the upgrade I suspect an issue in PG14.4. > At this point I suspect that PG 14.3 vs 14.4 is a red herring, > and what you should be looking into is what else you updated > at about the same time, particularly in the line of container > management tools or the like (if you use any). > As mentioned, the machine is a dedicated DB server serving a single PG instance. No containers are in place, no cgroup setup (neither memory, CPU limits), clean CentOS7 + epel-release + pg packages (node_exporter and postgres_exporter for monitoring). The only change on that machine was an upgrade from PG14.3 to PG14.4 (using PGDG RPM repo packages) and from that moment, the memory usage and out-of-memory issues started, the first time less than 24 hours after the upgrade. I'm far from being sure it is a Postgres memory leak, my primary intention is to provide as much data as I could to allow helping me identify this issue's root cause. Wen, I'll be back from my vacation (2 weeks) I'd like to downgrade to 14.3 (and we have to resist using index concurrently operations) to see whether such memory allocations disappear or not. Any hints or tips on what to do to analyze it better (what data from OS we have to collect over time... (ideally avoiding the downgrade exercise) are welcome. Kind regards Ales Zeleny