Andres Freund <and...@2ndquadrant.com> writes: > So I think a better way to deal with that warning would be a good > idea. Besides somehow making the mechanism there are two ways to attack > this that I can think of, neither of them awe inspiring:
> 1) Make that WARNING a LOG message instead. Since those don't get send > to the client with default settings... > 2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased > it to. Yeah, I've been getting more annoyed by that too lately. I keep wondering though whether there's an actual bug underneath that behavior that we're failing to see. PGSTAT_MAX_WAIT_TIME is already 10 seconds; it's hard to credit that increasing it still further would be "fixing" anything. The other change would also mainly just sweep the issue under the rug, if there is any issue and it's not just that we're overloading underpowered buildfarm machines. (Maybe a better fix would be to reduce MAX_CONNECTIONS for the tests on these machines?) I wonder whether when multiple processes are demanding statsfile updates, there's some misbehavior that causes them to suck CPU away from the stats collector and/or convince it that it doesn't need to write anything. There are odd things in the logs in some of these events. For example in today's failure on hamster, http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07 there are two client-visible wait-timeout warnings, one each in the gist and spgist tests. But in the postmaster log we find these in fairly close succession: [549c38ba.724d:2] WARNING: pgstat wait timeout [549c39b1.73e7:10] WARNING: pgstat wait timeout [549c38ba.724d:3] WARNING: pgstat wait timeout Correlating these with other log entries shows that the first and third are from the autovacuum launcher while the second is from the gist test session. So the spgist failure failed to get logged, and in any case the big picture is that we had four timeout warnings occurring in a pretty short span of time, in a parallel test set that's not all that demanding (12 parallel tests, well below our max). Not sure what to make of that. BTW, I notice that in the current state of pgstat.c, all the logic for keeping track of request arrival times is dead code, because nothing is actually looking at DBWriteRequest.request_time. This makes me think that somebody simplified away some logic we maybe should have kept. But if we're going to leave it like this, we could replace the DBWriteRequest data structure with a simple OID list and save a fair amount of code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers