----- Original Message -----
> Thanks guys.. I'll check out mcollective.  Yeah, the root password 60
> times is a bit painful, but the ssh loop would help.  If I remember
> right, there is an API/REST call for Foreman that will give me a list
> of the hosts not responsive.
> 
> The problem here is that puppet was in memory, and running.  It just
> wasn't responsive, perhaps waiting for something to happen that never
> did.  So, checks for the process (monit/snmp/pgrep), etc would say
> that puppet is fine.
> 
> Are there any more bullet-proof ways of watch-dogging Puppet
> specifically?  Could we kill the process if catalog locks are more
> than 30 minutes old? Or are locks on the catalog even a reality? Is this
> something Puppet could do on its own, in a separate thread, or does
> it need a new process?  I'm just throwing an idea or two.

Times I've seen this happen is when the network connection to the master dies
at just the right (wrong) time so the Ruby VM gets stuck on blocking IO which
it can never recover from.  So a supervisor thread wont do - it would also be
blocked.

I've written a monitor script for puppet that uses the new last_run_summary.yaml
file to figure out if puppet has recently run and I monitor that with nagios
and nrpe.  So at least I know when this happens

https://github.com/ripienaar/monitoring-scripts/blob/master/puppet/check_puppet.rb

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to