Hello,

po 11. 7. 2022 v 14:47 odesílatel Pavel Stehule <pavel.steh...@gmail.com>
napsal:

> Hi
>
> It's looks like memory leak
>

As we are collecting some data for Postgres processes, I've found processes
using Rss Anonymous memory allocations:

Some examples:
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon:  742852 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon:  684676 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon:  684700 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon:  684824 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB

other processes uses less memory:
2022-07-08T14:52:08+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon:    2532 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon:    2532 kB



And Shared memory:
2022-07-08T14:59:12+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB

while other processes uses failry less of shared mem:

2022-07-08T14:55:25+0200 Name: postmaster Pid: 8521 RssShmem:    1988 kB
2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem:    1988 kB
2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB

Wich processes are better candidates for investigation, the ones using more
Anonymous memory or the ones using larger shared memory?
I can do the (tedious) work to get a list of statements that a given PID
executed (at least the statements that appear in the minute snapshots).


>   ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
>> Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks);
>> 1077400 used
>> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out
>> of memory
>> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>>  Failed on request of size 152094068 in memory context
>> "TopTransactionContext".
>> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>>  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>>
>> 2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG:  could
>> not fork worker process: Cannot allocate memory
>> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
>> not fork worker process: Cannot allocate memory
>> terminate called after throwing an instance of 'std::bad_alloc'
>>   what():  std::bad_alloc
>>
> 2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:
>>  background worker "parallel worker" (PID 4303) was terminated by signal 6:
>> Aborted
>>
>>
> But what you see is probably just side effect. Do you use some extension
> in C++? This is C++ message. This process kills Postgres.
>

These extensions are installed (and no change from 14.3):

                                            List of installed extensions
        Name        | Version |   Schema   |
 Description
--------------------+---------+------------+------------------------------------------------------------------------
 amcheck            | 1.3     | public     | functions for verifying
relation integrity
 dblink             | 1.2     | public     | connect to other PostgreSQL
databases from within a database
 file_fdw           | 1.0     | public     | foreign-data wrapper for flat
file access
 hstore             | 1.8     | public     | data type for storing sets of
(key, value) pairs
 hypopg             | 1.3.1   | public     | Hypothetical indexes for
PostgreSQL
 pageinspect        | 1.9     | public     | inspect the contents of
database pages at a low level
 pg_buffercache     | 1.3     | public     | examine the shared buffer cache
 pg_stat_kcache     | 2.2.0   | public     | Kernel statistics gathering
 pg_stat_statements | 1.9     | public     | track planning and execution
statistics of all SQL statements executed
 pgcrypto           | 1.3     | public     | cryptographic functions
 pgstattuple        | 1.5     | public     | show tuple-level statistics
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 plpgsql_check      | 2.1     | public     | extended check for plpgsql
functions
 postgres_fdw       | 1.1     | public     | foreign-data wrapper for
remote PostgreSQL servers
 tablefunc          | 1.0     | public     | functions that manipulate
whole tables, including crosstab
(15 rows)




>
> It can be related to some operation over
> prematch.replication.tab_queue_tmp because in both cases an atovacuum was
> started.
>

On 14.3 (and also older major versions), there were no such issues. This
table is heavily updated, so the autovacuum is frequent here.

>
>
>>
>>
>> 132    hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks);
>> 21136 used
>> 133  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
>> 134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks);
>> 1077400 used
>> 1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:
>>  out of memory
>> 1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>>  Failed on request of size 152094068 in memory context
>> "TopTransactionContext".
>> 1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>>  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>>
>
> I am not sure, but it looks like corrupted table
> prematch.replication.tab_queue_tmp
>

$ /usr/pgsql-14/bin/pg_amcheck -v -d prematch -t replication.tab_queue_tmp
--heapallindexed
pg_amcheck: including database "prematch"
pg_amcheck: in database "prematch": using amcheck version "1.3" in schema
"public"
pg_amcheck: checking heap table "prematch.replication.tab_queue_tmp"
pg_amcheck: checking btree index "prematch.replication.tab_queue_tmp_pkey"
$ echo $?
0

Something else I can check?


Thanks Ales


>
>> And a few days later again:
>> 2022-07-08 14:54:22 CEST [4791]: [81186-1]
>> user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT
>> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' 
>> terminate
>> called after throwing an instance of 'std::bad_alloc' terminate called
>> after throwing an instance of 'std::bad_alloc' what(): what():
>> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
>> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
>> was terminated by signal 6: Aborted
>> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>>  terminating any other active server processes
>>
>>
>> Thanks Ales Zeleny
>>
>

Reply via email to