Hello, po 11. 7. 2022 v 14:47 odesílatel Pavel Stehule <pavel.steh...@gmail.com> napsal:
> Hi > > It's looks like memory leak > As we are collecting some data for Postgres processes, I've found processes using Rss Anonymous memory allocations: Some examples: 2022-07-08T14:50:24+0200 Name: postmaster Pid: 31522 RssAnon: 742600 kB 2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon: 742600 kB 2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon: 742852 kB 2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB 2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB 2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB 2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB 2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon: 684676 kB 2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon: 684700 kB 2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon: 684824 kB 2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB 2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB 2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB 2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB 2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB other processes uses less memory: 2022-07-08T14:52:08+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB 2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB 2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB 2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB 2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB And Shared memory: 2022-07-08T14:59:12+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB 2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB 2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB 2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB 2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB 2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB 2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB 2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB 2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB 2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB 2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB while other processes uses failry less of shared mem: 2022-07-08T14:55:25+0200 Name: postmaster Pid: 8521 RssShmem: 1988 kB 2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem: 1988 kB 2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB 2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB 2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB 2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB 2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB Wich processes are better candidates for investigation, the ones using more Anonymous memory or the ones using larger shared memory? I can do the (tedious) work to get a list of statements that a given PID executed (at least the statements that appear in the minute snapshots). > ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used >> Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); >> 1077400 used >> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR: out >> of memory >> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL: >> Failed on request of size 152094068 in memory context >> "TopTransactionContext". >> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT: >> automatic vacuum of table "prematch.replication.tab_queue_tmp" >> >> 2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG: could >> not fork worker process: Cannot allocate memory >> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG: could >> not fork worker process: Cannot allocate memory >> terminate called after throwing an instance of 'std::bad_alloc' >> what(): std::bad_alloc >> > 2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG: >> background worker "parallel worker" (PID 4303) was terminated by signal 6: >> Aborted >> >> > But what you see is probably just side effect. Do you use some extension > in C++? This is C++ message. This process kills Postgres. > These extensions are installed (and no change from 14.3): List of installed extensions Name | Version | Schema | Description --------------------+---------+------------+------------------------------------------------------------------------ amcheck | 1.3 | public | functions for verifying relation integrity dblink | 1.2 | public | connect to other PostgreSQL databases from within a database file_fdw | 1.0 | public | foreign-data wrapper for flat file access hstore | 1.8 | public | data type for storing sets of (key, value) pairs hypopg | 1.3.1 | public | Hypothetical indexes for PostgreSQL pageinspect | 1.9 | public | inspect the contents of database pages at a low level pg_buffercache | 1.3 | public | examine the shared buffer cache pg_stat_kcache | 2.2.0 | public | Kernel statistics gathering pg_stat_statements | 1.9 | public | track planning and execution statistics of all SQL statements executed pgcrypto | 1.3 | public | cryptographic functions pgstattuple | 1.5 | public | show tuple-level statistics plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language plpgsql_check | 2.1 | public | extended check for plpgsql functions postgres_fdw | 1.1 | public | foreign-data wrapper for remote PostgreSQL servers tablefunc | 1.0 | public | functions that manipulate whole tables, including crosstab (15 rows) > > It can be related to some operation over > prematch.replication.tab_queue_tmp because in both cases an atovacuum was > started. > On 14.3 (and also older major versions), there were no such issues. This table is heavily updated, so the autovacuum is frequent here. > > >> >> >> 132 hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks); >> 21136 used >> 133 ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used >> 134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); >> 1077400 used >> 1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR: >> out of memory >> 1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL: >> Failed on request of size 152094068 in memory context >> "TopTransactionContext". >> 1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT: >> automatic vacuum of table "prematch.replication.tab_queue_tmp" >> > > I am not sure, but it looks like corrupted table > prematch.replication.tab_queue_tmp > $ /usr/pgsql-14/bin/pg_amcheck -v -d prematch -t replication.tab_queue_tmp --heapallindexed pg_amcheck: including database "prematch" pg_amcheck: in database "prematch": using amcheck version "1.3" in schema "public" pg_amcheck: checking heap table "prematch.replication.tab_queue_tmp" pg_amcheck: checking btree index "prematch.replication.tab_queue_tmp_pkey" $ echo $? 0 Something else I can check? Thanks Ales > >> And a few days later again: >> 2022-07-08 14:54:22 CEST [4791]: [81186-1] >> user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT >> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' >> terminate >> called after throwing an instance of 'std::bad_alloc' terminate called >> after throwing an instance of 'std::bad_alloc' what(): what(): >> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1] >> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251) >> was terminated by signal 6: Aborted >> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG: >> terminating any other active server processes >> >> >> Thanks Ales Zeleny >> >