This is the error on postgres log of the segmentation fault : 2020-01-21 14:20:29 GMT [] [42222]: [108-1] db=,user= LOG: server process (PID 2042) was terminated by signal 11: Segmentation fault 2020-01-21 14:20:29 GMT [] [42222]: [109-1] db=,user= DETAIL: Failed process was running: select pid from pg_stat_activity where query ilike 'REFRESH MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats' 2020-01-21 14:20:29 GMT [] [42222]: [110-1] db=,user= LOG: terminating any other active server processes 2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [4-1] db=cmdv3,user=admin WARNING: terminating connection because of crash of another server process 2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [5-1] db=cmdv3,user=admin DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [6-1] db=cmdv3,user=admin HINT: In a moment you should be able to reconnect to the database and repeat your command. 2020-01-21 14:20:29 GMT [127.0.0.1(34026)] [2055]: [5-1] db=cmdv3,user=admin WARNING: terminating connection because of crash of another server process
At CentOS level : Jan 21 14:20:29 usnyh-cmd1.gt-t.net kernel: postmaster[2042]: segfault at 0 ip 000000000048bef4 sp 00007ffdf4955bb0 error 4 in postgres[400000+6c5000] Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905 GMT [33727] WARNING C-0x21526e8: cmdv3/admin@10.151.2.154:39688 pooler error: server conn crashed? Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905 GMT [33727] WARNING S-0x1f5ff58: cmdv3/admin@127.0.0.1:5432 got packet 'N' from server when not linked Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905 GMT [33727] WARNING S-0x25097f0: cmdv3/admin@127.0.0.1:5432 got packet 'N' from server when not linked Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.906 GMT [33727] WARNING S-0x2508b60: cmdv3/admin@127.0.0.1:5432 got packet 'N' from server when not linked Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.981 GMT [33727] ERROR S: login failed: FATAL: the database system is in recovery mode > If you're on Linux, you can probably see them with "ls /dev/shm". I see a lot of files there, and doing a cat they are empty. What can I do with them? Those are two different problems I guess, but they are related because right before the Segmentation Fault I see a lot of shared segment errors in the postgres log. Il giorno mer 29 gen 2020 alle ore 10:09 Thomas Munro < thomas.mu...@gmail.com> ha scritto: > On Wed, Jan 22, 2020 at 4:06 AM Nicola Contu <nicola.co...@gmail.com> > wrote: > > after a few months, we started having this issue again. > > So we revert the work_mem parameter to 600MB instead of 2GB. > > But the issue is still there. A query went to segmentation fault, the DB > went to recovery mode and our app went to read only for a few minutes. > > Hi Nicola, > Hmm, a segmentation fault sounds like a different problem. Can you > please share the exact error messages from PostgreSQL and OS logs? > > > I understand we can increase max_connections so we can have many more > segments. > > > > My question is : is there a way to understand the number of segments we > reached? > > If you're on Linux, you can probably see them with "ls /dev/shm". > > > Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have > about 500 shared segments. > > We would like to increase that number to 300 or 400 but would be great > to understand if there is a way to make sure we will solve the issue as it > requires a restart of the service. > > > > I know you were also talking about a redesign this part in PostgreSQL. > Do you know if anything has changed in any of the newer versions after 11.5? > > It's possible that we should increase a couple of constants used the > formula -- I'll look into that again. But first I'd like to see if > we're even investigating the right problem here. >