random_page_cost == 1.1 wt., 9 cze 2020 o 14:01 Avinash Kumar <avinash.vallar...@gmail.com> napisaĆ(a):
> Hi, > > On Fri, Jun 5, 2020 at 7:07 AM Krzysztof Olszewski <kolsze...@gmail.com> > wrote: > >> I have problem with one of my Postgres production server. Server works >> fine almost always, but sometimes without any increase of transactions or >> statements amount, machine gets stuck. Cores goes up to 100%, load up to >> 160%. When it happens then there are problems with connect to database and >> even it will succeed, simple queries works several seconds instead of >> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), >> sometimes we must restart Postgres, Linux, or even KVM (which exists as >> virtualization host). >> >> My hardware >> 56 cores (Intel Core Processor (Skylake, IBRS)) >> 400 GB RAM >> RAID10 with about 40k IOPS >> >> Os >> CentOS Linux release 7.7.1908 >> kernel 3.10.0-1062.18.1.el7.x86_64 >> >> Databasesize 100 GB (entirely fit in memory :) ) >> server_version 10.12 >> effective_cache_size 192000 MB >> maintenance_work_mem 2048 MB >> max_connections 150 >> shared_buffers 64000 MB >> work_mem 96 MB >> > What is the value set to random_page_cost ? > Set to 1 (same as default seq_page_cost) for a moment and try it. > >> >> On normal state, i have about 500 tps, 5% usage of cores, about 3% of >> load, whole database fits in memory, no reads from disk, only writes on >> about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this >> hardware there is no problem with this values (no iowaits on cores). In >> normal state this machine does "nothing". Connections to database are >> created by two app servers based on Java, through connection pools, so >> connections count is limited by configuration of pools and max is 120, is >> lower value than in Postgres configuration (150). On normal state there is >> about 20 connections, when stuck goes into max (120). >> >> In correlation with stucks i see informations in kernel log about >> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935] >> but i don't know this is reason or effect of problem >> I made investigation with pgBadger and ... nothing strange happens, just >> normal statements >> >> Any ideas? >> >> Thanks, >> Kris >> >> >> > > -- > Regards, > Avinash Vallarapu >