Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

Greg Smith Thu, 19 Jan 2012 20:55:18 -0800

On 01/19/2012 10:52 AM, Robert Haas wrote:

It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?

Right--they get updated in the same atomic bump that moves up thingslike buffers_checkpoint

Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk.  I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated.  The only number that really seems
useful to me is the time spent syncing.  I have a clear idea what to
look for there: smaller numbers are better than bigger ones.  For the
rest I'm mystified.

Priority #1 here is to reduce (but, admittedly, not always eliminate)the need for log file parsing of this particular area, so including allthe major bits from the existing log message that can be published thisway would include the write phase time. You mentioned one reason whythe write phase time might be interesting; there could be others. Oneof the things expected here is that Munin will expand its graphing ofvalues from pg_stat_bgwriter to include all these fields. Most of thetime the graph of time spent in the write phase will be boring anduseless. Making it easy for a look at a graph to spot those rare timeswhen it isn't is one motivation for including it.

As for why to include the number of files being sync'd, one reason isagain simply wanting to include everything that can easily bepublished. A second is that it helps support ideas like my "Checkpointsync pause" one; that's untunable in any reasonable way without someeasy way of monitoring the number of files typically sync'd. Sometimeswhen I'm investigating checkpoint spikes during sync, I wonder whetherthey were because more files than usual were synced, or if it's insteadjust because of more churn on a smaller number. Making this easy tograph pulls that data out to where I can compare it with disk I/Otrends. And there's precedent now proving that an always incrementingnumber in pg_stat_bgwriter can be turned into such a graph easily bymonitoring tools.

And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be "what percentage of my
total sync time is accounted for by the longest sync?".  And so right
there I'm back to the logs.  It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.

This isn't ideal yet. I mentioned how some future "performance eventlogging history collector" was really needed as a place to push longestsync times into, and we don't have it yet. This is the best thing toinstrument that I'm sure is useful, and that I can stick onto with theexisting infrastructure.

The idea is that this change makes it possible to trigger a "sync timesare too long" alert out of a tool that's based solely on databasequeries. When that goes off, yes you're possibly back to the logs againfor more details about the longest individual sync time. But the restof the time, what's hopefully the normal state of things, you can ignorethe logs and just track the pg_stat_bgwriter numbers.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

Reply via email to