On 01/19/2012 10:52 AM, Robert Haas wrote:
It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?

Right--they get updated in the same atomic bump that moves up things like buffers_checkpoint

Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk.  I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated.  The only number that really seems
useful to me is the time spent syncing.  I have a clear idea what to
look for there: smaller numbers are better than bigger ones.  For the
rest I'm mystified.

Priority #1 here is to reduce (but, admittedly, not always eliminate) the need for log file parsing of this particular area, so including all the major bits from the existing log message that can be published this way would include the write phase time. You mentioned one reason why the write phase time might be interesting; there could be others. One of the things expected here is that Munin will expand its graphing of values from pg_stat_bgwriter to include all these fields. Most of the time the graph of time spent in the write phase will be boring and useless. Making it easy for a look at a graph to spot those rare times when it isn't is one motivation for including it.

As for why to include the number of files being sync'd, one reason is again simply wanting to include everything that can easily be published. A second is that it helps support ideas like my "Checkpoint sync pause" one; that's untunable in any reasonable way without some easy way of monitoring the number of files typically sync'd. Sometimes when I'm investigating checkpoint spikes during sync, I wonder whether they were because more files than usual were synced, or if it's instead just because of more churn on a smaller number. Making this easy to graph pulls that data out to where I can compare it with disk I/O trends. And there's precedent now proving that an always incrementing number in pg_stat_bgwriter can be turned into such a graph easily by monitoring tools.

And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be "what percentage of my
total sync time is accounted for by the longest sync?".  And so right
there I'm back to the logs.  It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.

This isn't ideal yet. I mentioned how some future "performance event logging history collector" was really needed as a place to push longest sync times into, and we don't have it yet. This is the best thing to instrument that I'm sure is useful, and that I can stick onto with the existing infrastructure.

The idea is that this change makes it possible to trigger a "sync times are too long" alert out of a tool that's based solely on database queries. When that goes off, yes you're possibly back to the logs again for more details about the longest individual sync time. But the rest of the time, what's hopefully the normal state of things, you can ignore the logs and just track the pg_stat_bgwriter numbers.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to