Debugging leaking memory in Postgresql 13.2/Postgis 3.1

2021-03-30 Thread Stephan Knauss

Hello,

I have a setup in which I have constantly 8 connections to the database 
open which query data for map rendering. The query involve only a hand 
full of tables, but could be complex and use postgis functions.


The connection stays open. Only read-requests are happening, no 
update/insert/delete.


I observe available memory going down at a rate of roughly 2GB per hour.

I am running the database inside a docker container, using latest 
postgis/postgis image which is then using the latest postgres image as a 
base.


PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
POSTGIS=""3.1.1 aaf4c79"" [EXTENSION] PGSQL=""130"" 
GEOS=""3.7.1-CAPI-1.11.1 27a5e771"" PROJ=""Rel. 5.2.0, September 15th, 
2018"" LIBXML=""2.9.4"" LIBJSON=""0.12.1"" LIBPROTOBUF=""1.3.1"" 
WAGYU=""0.5.0 (Internal)""


It keeps increasing RssAnon memory of the backends until OOM.

The wiki suggested to dump MemoryContext states for more details, but 
something strange happens when attaching gdb. It seems that the process 
is immediately killed and I can no longer dump such details.


Is this related to running within docker? I added capabilities and 
seccomp for gdb:


    cap_add:
  - SYS_PTRACE
    security_opt:
  - seccomp:unconfined

with count(*) in pg_stat_activity I see 20 connections. Besides the 8 
expected client backend it also lists some parallel worker and other 
postgres maintenance tasks.


work_mem is set to 50MB. shared_buffers to 3500MB. By looking at 
/proc/meminfo I think that all of this shared memory is statically 
allocated as HugePages. I see no increase there. Only increase is within 
RssAnon.


Any ideas on how to debug deeper into this issue?

Any idea why gdb is not working as expected? More logs below.

It seems that the binary is terminating while gdb is attaching also 
sending a cont immediately does not change it.


Any ideas what configuration parameter might help? I had a similar setup 
working with older versions of postgresql and postgis and it did not 
show this constant loss of memory.


I can work-around by periodically sending a SIGHUP to my clients, which 
will then reconnect to postgresql and this frees the memory of the 
backends. So is something kept longer in memory than needed?


Others seem to have similar issues using slightly older versions as well

https://github.com/openstreetmap/mod_tile/issues/181#issuecomment-738303445


I would appreciate some ideas/pointers on how to track this down further.


Logs of OOM (before using hugepages, but this did not change the situation):

postgres invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), 
order=0, oom_score_adj=0

CPU: 9 PID: 3322620 Comm: postgres Not tainted 5.4.0-67-generic #75-Ubuntu
Hardware name: Gigabyte Technology Co., Ltd. B360 HD3P-LM/B360HD3PLM-CF, 
BIOS F7 HZ 07/24/2020

Call Trace:
 dump_stack+0x6d/0x8b
 dump_header+0x4f/0x1eb
 oom_kill_process.cold+0xb/0x10
 out_of_memory.part.0+0x1df/0x3d0
 out_of_memory+0x6d/0xd0
 __alloc_pages_slowpath+0xd5e/0xe50
 __alloc_pages_nodemask+0x2d0/0x320
 alloc_pages_current+0x87/0xe0
 __page_cache_alloc+0x72/0x90
 pagecache_get_page+0xbf/0x300
 filemap_fault+0x6b2/0xa50
 ? unlock_page_memcg+0x12/0x20
 ? page_add_file_rmap+0xff/0x1a0
 ? xas_load+0xd/0x80
 ? xas_find+0x17f/0x1c0
 ? filemap_map_pages+0x24c/0x380
 ext4_filemap_fault+0x32/0x50
 __do_fault+0x3c/0x130
 do_fault+0x24b/0x640
 ? __switch_to_asm+0x34/0x70
 __handle_mm_fault+0x4c5/0x7a0
 handle_mm_fault+0xca/0x200
 do_user_addr_fault+0x1f9/0x450
 __do_page_fault+0x58/0x90
 do_page_fault+0x2c/0xe0
 page_fault+0x34/0x40
RIP: 0033:0x558ddf0beded
Code: Bad RIP value.
RSP: 002b:7ffe5214a020 EFLAGS: 00010202
RAX: 7fea26b16b28 RBX: 0028 RCX: 7fea26b16b68
RDX: 0028 RSI:  RDI: 7fea26b16b28
RBP: 0010 R08: 7fea26b16b28 R09: 0019
R10: 0001 R11: 0001 R12: 
R13: 7fea26af76d8 R14: 7fea26af7728 R15: 
Mem-Info:
active_anon:29797121 inactive_anon:2721653 isolated_anon:32
 active_file:323 inactive_file:83 isolated_file:0
 unevictable:16 dirty:14 writeback:0 unstable:0
 slab_reclaimable:85925 slab_unreclaimable:106003
 mapped:1108591 shmem:14943591 pagetables:69567 bounce:0
 free:148637 free_pcp:1619 free_cma:0
Node 0 active_anon:119188484kB inactive_anon:10886612kB 
active_file:1292kB inactive_file:332kB unevictable:64kB 
isolated(anon):128kB isolated(file):0kB mapped:4434364kB dirty:56kB 
writeback:0kB shmem:59774364kB shmem_thp: 0kB shmem_pmdmapped: 0kB 
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Node 0 DMA free:15904kB min:8kB low:20kB high:32kB active_anon:0kB 
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
writepending:0kB present:15988kB managed:15904kB mlocked:0kB 
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB

lowmem_reserve[]: 0 809 128670

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

2021-03-31 Thread Stephan Knauss

On 30.03.2021 20:46, Tom Lane wrote:

Stephan Knauss  writes:

The wiki suggested to dump MemoryContext states for more details, but
something strange happens when attaching gdb. It seems that the process
is immediately killed and I can no longer dump such details.

(I think the -v option is the one that matters on Linux, not -d
as you might guess).  The idea here is that the backends would
get an actual ENOMEM failure from malloc() before reaching the
point where the kernel's OOM-kill behavior takes over.  Given
that, they'd dump memory maps to stderr of their own accord,
and you could maybe get some insight as to what's leaking.
This'd also reduce the severity of the problem when it does
happen.


Hello Tom, the output below looks similar to the OOM output you 
expected. Can you give a hint how to interpret the results?


I had a backend which had a larger amount of memory allocated already. 
So I gave "gcore -a" a try.


In contrast to the advertised behavior, the process did not continue to 
run but I got a core file at least. Probably related to gcore just 
calling gdb attach which somehow triggers a SIGKILL of all backends.


With 4.2GB in size it hopefully has most of the relevant memory 
structures are there. Without a running process I still can not call 
MemoryContextStats(), but I found a macro which claims to decode the 
memory structure post mortem:


https://www.cybertec-postgresql.com/en/checking-per-memory-context-memory-consumption/


This gave me the following memory structure:

How should it be interpreted? It looks like the size is bytes as it 
calculates with pointers. But the numbers look a bit small, given that I 
had a backend with roughly 6GB RSS memory.


I thought it might print overall size and then indent and print the 
memory of children, but the numbers do indicate this is not the case, 
having a higher level smaller size than children:


  CachedPlanSource: 67840
   unnamed prepared statement: 261920

So how to read it and any indication why I have a constantly increasing 
memory footprint? Is there any indication where multiple gigabytes are 
allocated?




root@0ec98d20bda2:/# gdb /usr/lib/postgresql/13/bin/postgres core.154218 

GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/13/bin/postgres...Reading 
symbols from 
/usr/lib/debug/.build-id/31/ae2853776500091d313e76cf679017e697884b.debug...done.

done.

warning: core file may not match specified executable file.
[New LWP 154218]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: osm gis 172.20.0.3(51894) idle'.
#0  0x7fc01cfa07b7 in epoll_wait (epfd=4, events=0x55f403584080, 
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
30  ../sysdeps/unix/sysv/linux/epoll_wait.c: 
No such file or directory.
(gdb) >>>> > > >>>(gdb) (gdb) >>>> > > >>>>> > > >>(gdb) (gdb) 
TopMemoryContext: 109528

 dynahash: 7968
 HandleParallelMessages: 7968
 dynahash: 7968
 dynahash: 7968
 dynahash: 7968
 dynahash: 24392
 dynahash: 24352
 RowDescriptionContext: 24352
 MessageContext: 7968
 dynahash: 7968
 dynahash: 32544
 TransactionAbortContext: 32544
 dynahash: 7968
 TopPortalContext: 7968
 dynahash: 16160
 CacheMemoryContext: 1302944
  CachedPlan: 138016
  CachedPlanSource: 67840
   unnamed prepared statement: 261920
  index info: 1824
  index info: 1824
  index info: 3872
  index info: 1824
  index info: 1824
  index info: 3872
  index info: 3872
  index info: 3872
  index info: 1824
  index info: 3872
  relation rules: 32544
  index info: 1824
  index info: 1824
  index info: 1824
  index info: 3872
  relation rules: 24352
  index info: 3872
  index info: 3872
  index info: 1824
  index info: 3872
  index info: 3872
  index info: 3872
  index info: 1824
  index info: 3872
  index info: 1824
  index info: 3872
  relation rules: 32544
  index info: 1824
  index info: 2848
  index info: 1824
  index info: 3872
  index info: 3872
  index info: 3872
  index info: 3872
  index 

Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

2021-04-03 Thread Stephan Knauss

On 31.03.2021 20:27, Paul Ramsey wrote:

On Mar 31, 2021, at 11:24 AM, Tom Lane  wrote:

If that's not it, then the leak must be accumulating through plain
old malloc calls.  There's not much of that in the core backend
(although if you use ispell text search dictionaries, maybe [1] is
relevant), so my suspicions would next fall on any extensions you
might be using.

Would be interested in the queries being run. We have a reproduceable leak in 
<-> geography operator that we have been unable to track down.

I tried to get any idea where the memory is by feeding my core file into 
chap. I ended up not knowing how to proceed further. I found a 
std::vector which was roughly 70 MB in size and a lot of allocations 
chap could not identify where they belonged to:


chap> summarize used
Unrecognized allocations have 23098683 instances taking 
0xe6973688(3,868,669,576) bytes.
   Unrecognized allocations of size 0x68 have 8214926 instances taking 
0x32ec61b0(854,352,304) bytes.
   Unrecognized allocations of size 0x28 have 4890673 instances taking 
0xba907a8(195,626,920) bytes.
   Unrecognized allocations of size 0x38 have 3255995 instances taking 
0xade38e8(182,335,720) bytes.
   Unrecognized allocations of size 0x48 have 2096221 instances taking 
0x8fefa28(150,927,912) bytes.
   Unrecognized allocations of size 0x78 have 1792042 instances taking 
0xcd153b0(215,045,040) bytes.
   Unrecognized allocations of size 0x88 have 1393038 instances taking 
0xb4ad370(189,453,168) bytes.
   Unrecognized allocations of size 0x58 have 657193 instances 
taking 
0x3727618(57,832,984) bytes.
   Unrecognized allocations of size 0xa8 have 360742 instances 
taking 
0x39cc0f0(60,604,656) bytes.
   Unrecognized allocations of size 0x18 have 270834 instances 
taking 
0x632eb0(6,500,016) bytes.
   Unrecognized allocations of size 0x98 have 60372 instances taking 
0x8c05e0(9,176,544) bytes.
   Unrecognized allocations of size 0xb8 have 37559 instances taking 
0x697388(6,910,856) bytes.
   Unrecognized allocations of size 0xd8 have 19033 instances taking 
0x3ebb18(4,111,128) bytes.
   Unrecognized allocations of size 0x128 have 19010 instances 
taking 
0x55dc50(5,626,960) bytes.
   Unrecognized allocations of size 0xe8 have 9526 instances taking 
0x21b8f0(2,210,032) bytes.
   Unrecognized allocations of size 0x118 have 9513 instances taking 
0x28a4d8(2,663,640) bytes.
   Unrecognized allocations of size 0x1f8 have 9499 instances taking 
0x490d28(4,787,496) bytes.
   Unrecognized allocations of size 0x408 have 292 instances taking 
0x49920(301,344) bytes.
   Unrecognized allocations of size 0x2008 have 194 instances taking 
0x184610(1,590,800) bytes.
   Unrecognized allocations of size 0x1008 have 155 instances taking 
0x9b4d8(636,120) bytes.
   Unrecognized allocations of size 0x4008 have 137 instances taking 
0x224448(2,245,704) bytes.
   Unrecognized allocations of size 0x8008 have 133 instances taking 
0x428428(4,359,208) bytes.
   Unrecognized allocations of size 0x10008 have 131 instances 
taking 
0x830418(8,586,264) bytes.
   Unrecognized allocations of size 0x40008 have 131 instances 
taking 
0x20c0418(34,341,912) bytes.
   Unrecognized allocations of size 0x18 have 130 instances taking 
0x8200410(136,315,920) bytes.
   Unrecognized allocations of size 0x20008 have 129 instances 
taking 
0x1020408(16,909,320) bytes.
   Unrecognized allocations of size 0x80008 have 129 instances 
taking 
0x4080408(67,634,184) bytes.
   Unrecognized allocations of size 0x28 have 129 instances taking 
0x10200408(270,533,640) bytes.
   Unrecognized allocations of size 0x48 have 128 instances taking 
0x2400(536,871,936) bytes.
   Unrecognized allocations of size 0x418 have 53 instances taking 
0xd8f8(55,544) bytes.

[...]
Pattern %VectorBody has 57401 instances taking 0x42eaaf0(70,167,280) bytes.
[...]
Pattern %MapOrSetNode has 51362 instances taking 0x3f2f40(4,140,864) bytes.

let me know if it would be interesting for you to have this (huge) core 
file with memory dump available somewhere.



I have executed a few minutes of queries on a single backend setup. The 
RssAnon was growing during this time by about 10 MB in size.


Full query logs are here: 
https://downloads.osm-tools.org/postgresql-2021-04-03_183913.csv.gz


I have the following extensions installed. hstore is heavily used and a 
little bit postgis.


gis=# SELECT extname FROM pg_extension;
  extname
---
 plpgsql
 adminpack
 postgis
 hstore
(4 rows)

as per other suggestions I enabled auto_explain as well.

I hope you can get a hint what is allocating the memory. I could offer 
to share some docker images to reproduce, but It would require a few 
commands to set it up and a few hundred megabytes of download.



Stephan







Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

2021-04-05 Thread Stephan Knauss

Hello Tom,

On 31.03.2021 20:24, Tom Lane wrote:

Based on nearby threads, it occurs to me to ask whether you have JIT
enabled, and if so whether turning it off helps.  There seems to be
a known leak of the code fragments generated by that in some cases.


That's it!

I am quite surprised that a functionality, which is on by default does 
generate such a massive leak and goes sort of undetected.


A single backend was leaking 250MB/hour, with my multiple connections it 
was 2GB. But exactly that happened.


Doing a set jit=off immediately stopped the leak.


You mentioned that this seems to be known. Do you have pointers to the 
relevant bug-tracker/thread? I would like to follow up on this.


I have not measured the impact of jit, but in theory it could bring 
larger performance benefits. So having it enabled sounds like a good 
idea, once it stops leaking.



I tried running Valgrind on postgres but I had not much success with it. 
processes seemed to terminate quite frequently. My last use of Valgrind 
is a while ago and my use-case back then was probably much simpler.


Is it known which queries are leading to a leak? I still have the 
recording of mine, including explain. Would it help to narrow it further 
down to single queries which leak? Or is the JIT re-creating optimized 
code for each slightly modified one without freeing the old ones? So 
re-running the same query would not leak?


https://downloads.osm-tools.org/postgresql-2021-04-03_183913.csv.gz


Stephan






Re: The Amazon CloudFront distribution is configured to block access from your country.

2021-04-11 Thread Stephan Knauss

Hello Felix,

On 11.04.2021 02:31, felix.quin...@yahoo.com wrote:

I see no reason why ask elsewhere first, the link for the installer download is 
on the Postgresql website, not elsewhere. The most logical thing to do is ask 
here first.



Because it is the Postgresql installer. 
https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

It's unfortunate that your download is blocked. As indicated by others I 
am also wondering why you ask here for community support.


You pointed to a download link which is controlled by a private company 
selling Postgresql services:


Read about this company on an independent website: 
https://en.wikipedia.org/wiki/EnterpriseDB


If you are looking for the download links of the PostgreSQL project, you 
can find it here:


https://www.postgresql.org/download/


Based on your feedback, it is NOT the link listed on the PostgreSQL 
website, but on a 3rd party website. This is why you got the 
recommendation to check with them. You also got hints on how you could 
potentially work around the download restriction by using a VPN.



Stephan






Re: Psql wants to use IP6 when connecting to self using tcp...

2021-06-21 Thread Stephan Knauss

On 22.06.2021 02:44, Jerry Levan wrote:

The below fails... the 'fe80' is the ip6 address of big box.
[jerry@bigbox ~]$ psql -h bigbox
psql: error: FATAL:  no pg_hba.conf entry for host "fe80::fe3f:dbff:fed1:f62e%enp7s0", user 
"lxxx", database "lxxx", SSL off
[...]
Netstat shows that Postgres is listening on a ip6 interface.


You stated earlier, that you instructed PostgreSQL to listen to all 
interfaces. This includes IPv6. Your configuration misses to configure 
trust for your IPv6 network.


Alternatively you can change your network configuration to run on IPv4 
and not use IPv6. You would remove the IPv6 interface from PostgreSQL 
and change resolver to return IPv4 (A record, not ) for your host.


Stephan






Re: memory consumption of memory for postgres db container

2021-09-02 Thread Stephan Knauss

On 02.09.2021 16:35, M Tarkeshwar Rao wrote:


We did a stability test for our product, where we continuously 
inserting data into postgres tables. We did this for three days 
continuously.


Memory utilization of our Postgres container continuously increasing.

I am not certain whether it is related, but you might want to check. I 
had a similar issue where the anonymous memory was sort of leaking with 
a rate of 2 GiB per hour.


it was resolved by doing a "set jit=off". You can see the full thread 
including some hints on how to check where memory is allocated here:


https://www.postgresql.org/message-id/flat/9f73e655-14b8-feaf-bd66-c0f506224b9e%40stephans-server.de

Stephan