On Wed, Mar 2, 2016 at 3:49 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Derek Elder <der...@mirthcorp.com> writes: > > That was indeed the root cause. The /etc/hosts file on the server had > > incorrect permissions which caused localhost to not resolve. > > It strikes me that this should not have been so hard to solve. The > stats collector was trying to tell you what was wrong, but evidently > you could not interpret those messages correctly. I am thinking that > we need to do some work on the message wording; or maybe there is one > more message that needs to be emitted so you can follow the causal > chain? > > In particular, perhaps it wasn't immediately obvious that the first > of these messages was the cause of the second: > > > 2016-03-02 14:58:09 EST [14366]: [8-1] LOG: could not resolve > "localhost": Name or service not known > > 2016-03-02 14:58:09 EST [14366]: [9-1] LOG: disabling statistics > collector for lack of working socket > > in which case maybe we could rephrase the first message along the > lines of "could not resolve "localhost" to establish statistics > collector socket: <strerror detail here>". (There are a few other > messages in the same area that would need to be changed similarly.) > > Or maybe the problem was that when we forced track_counts off because of > no stats collector, we didn't emit any bleat noting that, which if we had > might have led you to realize that the above messages were the direct > cause of the next one: > > > 2016-03-02 14:58:09 EST [14366]: [10-1] WARNING: autovacuum not started > because of misconfiguration > > 2016-03-02 14:58:09 EST [14366]: [11-1] HINT: Enable the "track_counts" > option. > > Or both changes, or something else entirely? > > I'd be interested to hear how you perceived these log messages and > what you think might help the next person. >
The fact that the first two are only LOG level and not WARNING would seems like the easiest improvement to make. I had the benefit of basically knowing track_counts was a red-herring given the provided context so I went and started looking at anything preceding the first warning that could give me a hint as to the nature of the "misconfiguration". It probably would help to specify, if known, whether the suspected mis-configuration is external or internal to PostgreSQL - i.e., do I need to fix postgres.conf or is something external (like the hosts file) to blame. In this case since we don't control "localhost" it would be "external misconfiguration". This also doesn't help: show autovacuum; autovacuum ------------ on Why do we indirectly disable autovacuum via disabling one of its required parameters instead of just disabling the main property. I don't suppose we can add a third option (on, off, broken) to this which would allow distinguishing between a user-specified condition (off) and a system imposed one (broken). This is getting a bit deep for a rare problem like this - I think that making the root messages WARNING (or ERROR) instead of info (and ideally linking the two explicitly if possible) would have the desired effect of pointing the user to the first thing they need to fix - and assume they would ignore all subsequent messages (and hints) until the first one is handled (i.e. use good trouble-shooting practices). The hint and the change to track_counts then becomes a non-issue. David J.