I had log_min_duration_statement set to 0 for a short period, just before stuck and just after, so I have full list of SQL statements, next analyzed in pgBadger, there is no increase of amount of statements, and I can see, all statements are longer processed than before stuck. But following Your advice I'll check the results from pg_stat_statements.
pt., 5 cze 2020 o 13:16 <luis.robe...@siscobra.com.br> napisaĆ(a): > > *De: *"Krzysztof Olszewski" <kolsze...@gmail.com> > *Para: *pgsql-performance@lists.postgresql.org > *Enviadas: *Sexta-feira, 5 de junho de 2020 7:07:02 > *Assunto: *Postgresql server gets stuck at low load > > I have problem with one of my Postgres production server. Server works > fine almost always, but sometimes without any increase of transactions or > statements amount, machine gets stuck. Cores goes up to 100%, load up to > 160%. When it happens then there are problems with connect to database and > even it will succeed, simple queries works several seconds instead of > milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), > sometimes we must restart Postgres, Linux, or even KVM (which exists as > virtualization host). > My hardware56 cores (Intel Core Processor (Skylake, IBRS))400 GB RAMRAID10 > with about 40k IOPS > Os > CentOS Linux release 7.7.1908 > kernel 3.10.0-1062.18.1.el7.x86_64 Databasesize 100 GB (entirely fit in > memory :) )server_version 10.12effective_cache_size 192000 > MBmaintenance_work_mem 2048 MBmax_connections 150 shared_buffers 64000 > MBwork_mem 96 MBOn normal state, i have about 500 tps, 5% usage of cores, > about 3% of load, whole database fits in memory, no reads from disk, only > writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but > on this hardware there is no problem with this values (no iowaits on > cores). In normal state this machine does "nothing". Connections to > database are created by two app servers based on Java, through connection > pools, so connections count is limited by configuration of pools and max is > 120, is lower value than in Postgres configuration (150). On normal state > there is about 20 connections, when stuck goes into max (120).In > correlation with stucks i see informations in kernel log aboutNMI watchdog: > BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]but i don't know > this is reason or effect of problemI made investigation with pgBadger and > ... nothing strange happens, just normal statements Any ideas? Thanks, > Kris > > Hi Krzysztof! > > I would enable pg_stat_statements extension and check if there are long > running queries that should be quick. >