I am experiencing a curious event, and wondering if others have seen
this... As well, I have a question related to it.
Today, I noticed my puppet summary report from Foreman this morning,
that 60 of my 160 hosts all stopped reporting at nearly the exact same
time, and have not since restarted. Investigating, it appears that my
puppetmaster temporarily ran out of disk space on the /var volume,
probably in part do to logging. I have log rollers running, which
eventually freed up some disk space, but the 60 hosts, have not resumed
reporting.
If I dig into the logs on one of the failing agents, there are no
messages from puppet, past 4am (here is a snippet of my logs):
Jan 27 02:44:25 kmallory3 puppet-agent[15340]: Using cached catalog
Jan 27 02:44:25 kmallory3 puppet-agent[15340]: Could not retrieve
catalog; skipping run
Jan 27 03:14:30 kmallory3 puppet-agent[15340]: Could not retrieve
catalog from remote server: Error 400 on SERVER: No space left on device
- /var/lib/puppet/yaml/facts/kmallory3.xxx.xxx.xxx.yaml
Jan 27 03:14:30 kmallory3 puppet-agent[15340]: Using cached catalog
Jan 27 03:14:30 kmallory3 puppet-agent[15340]: Could not retrieve
catalog; skipping run
Jan 27 03:47:30 kmallory3 puppet-agent[15340]: Could not retrieve
plugin: execution expired
Jan 27 04:01:02 kmallory3 puppet-agent[15340]: Could not retrieve
catalog from remote server: execution expired
Jan 27 04:01:02 kmallory3 puppet-agent[15340]: Using cached catalog
Jan 27 04:01:02 kmallory3 puppet-agent[15340]: Could not retrieve
catalog; skipping run
Forcing a run of puppet, I get the following message:
kmallory3:/var/log# puppetd --onetime --test
notice: Ignoring --listen on onetime run
notice: Run of Puppet configuration client already in progress; skipping
After stopping and restarting the puppet service, the agent started
running properly. It appears that the failure from the server has
caused the agent to hang, from which it was not able to recover
gracefully. Has anyone experienced this before? We are running 2.6.1
on the large majority of our hosts, including this one. Many failed,
but 2/3rds keep running properly.
Now, on to my question.. Anyone got some bright ideas for how I could
force Puppet to restart itself on a 60 machines, when Puppet isn't
running?? I'm not really excited by the prospect of logging into 60
machines, and running a sudo command... sigh.
--Kyle
--
You received this message because you are subscribed to the Google Groups "Puppet
Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.