One of our instances has been behaving -- oddly. Most queries are blazing fast.
It appears to just be some of the stat views that are slow.
Queries against the following views are quick: pg_stat_activity,
pg_stat_xact_all_tables, pg_stat_xact_sys_tables, pg_stat_xact_user_tables,
pg_statio_sys_sequences (no system sequences), pg_stat_xact_user_functions, and
pg_stat_replication.
When I query one of the other statistics views, the queries take roughly 10
seconds when they used to take milliseconds. Here's an explain of one such
query: https://explain.depesz.com/s/IVEu
Logs show 'using stale statistics instead of current ones because stats
collector is not responding' when I query the stats views.
CPU usage is very low. Disk I/O looks okay to me.
-bash-4.1$ iostat
Linux 2.6.32-573.12.1.el6.x86_64 (hostname) 07/25/2016 _x86_64_
(16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.800.000.150.010.00 97.04
Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.45 4.85 7.45 33142948 50914838
sdb 31.48 1403.57 713.55 9597023154 4878965216
dm-0 0.14 0.58 0.5639497203847592
dm-1 1.00 4.26 6.83 29155482 46727688
Most of the files in $PGDATA/pg_stat_tmp haven't been updated by PostgreSQL
since July 7th at 14:16. The one exception is pgss_query_texts.stat. I
shouldn't have but I touched $PGDATA/pg_stat_tmp/db* on July 12th at 15:14
(doing so did trigger an autovacuum though).
I set track_activities and track_counts to off and reloaded config (pg_ctl
reload). The log indicated that it noticed the change. I created a directory
/dev/shm/pg_stat_tmp, set track_activities and track_counts on, set
stats_temp_directory to '/dev/shm/pg_stat_tmp' and reloaded config. Again, the
log indicated that it noticed the change but nothing happened with stats. The
new directory is empty.
I notified the statistics collector to quit (kill -QUIT). It did and the
postmaster restarted it. The new stats directory is still empty. The only
updates to the old stats directory is the pgss_query_texts.stat file.
I tried getting stack traces and the like with gdb and strace but it wasn't
really helpful.
Most of what I have tried was the result of help by Jim Nasby on
postgresteam.slack.com. He and Lukas Fittl suggested that I post about my
problem. There's one thing that we all agree on... It should not be so hard to
figure out what is going on with the statistics collector.
I tried turning stats off and on again today. Here's what the two pg_stat_tmp
directories look like today:
so now the directory looks like:
-bash-4.1$ ls -alh 9.4/data/pg_stat_tmp/
total 4.3M
drwx-- 2 postgres postgres 4.0K Jul 7 14:16 .
drwx-- 19 postgres postgres 4.0K Jul 25 11:20 ..
-rw--- 1 postgres postgres 3.4K Jul 12 15:14 db_0.stat
-rw--- 1 postgres postgres 26K Jul 12 15:14 db_13003.stat
-rw--- 1 postgres postgres 13K Jul 12 15:14 db_1.stat
-rw--- 1 postgres postgres 140K Jul 12 15:14 db_2473933.stat
-rw--- 1 postgres postgres 203K Jul 12 15:14 db_2476698.stat
-rw--- 1 postgres postgres 27K Jul 12 15:14 db_2595334.stat
-rw--- 1 postgres postgres 28K Jul 12 15:14 db_2631762.stat
-rw--- 1 postgres postgres 1.5K Jul 7 14:16 global.stat
-rw--- 1 postgres postgres 3.9M Jul 25 13:39 pgss_query_texts.stat
-bash-4.1$ ls -alh /dev/shm/pg_stat_tmp/
total 0
drwxr-xr-x 2 postgres postgres 40 Jul 12 17:19 .
drwxrwxrwt 3 root root 80 Jul 11 17:31 ..
Does anyone have any suggestions on how to:
- see what the statistics collector is doing?
- tell the postmaster to give the statistics collector the up to date config?
- get stats working without restarting the instance? (I am not yet sure when
that can happen but it would have to be scheduled [in advance] to occur during
an after hours maintenance window.
Thanks in advance,
Matt
--
Matthew Musgrove
Senior Software Engineer
Assurant Mortgage Solutions
817.788.4482
mmusgr...@emortgagelogic.com<mailto:mmusgr...@emortgagelogic.com>
matthew.musgr...@assurant.com<mailto:matthew.musgr...@assurant.com>