While you're logging into every host to install mcollective, there are some other things to think about (that are easily puppetizeable):
-remote syslogging, so that lots of logs don't cause application hosts to clot -file system monitoring for your hosts, so you get an alert before things fill up -trend analysis (graphs) on the hosts, so you get an alert when something's trending to fill up (by inode as well depending on the host) -something monitoring critical processes, so that if they stop responding it'll restart them (here I plug monit for simplicity's sake, but snmp agents and similar items can do this too) something monitoring the logs which can alarm when something is absent/present when it shouldn't be As to your immediate problem, try an ssh loop if you can run init scripts via sudo. Use -t so that sudo will have a tty. For security's sake you'll have to enter your password 60 times, but the experience will incentivize you to monitor for this problem. echo <<XX >/tmp/h host1 host2 XX for h in `cat /tmp/h`; do ssh -t $h sudo /etc/init.d/puppet restart; done Good luck. On Sat, Jan 28, 2012 at 08:53:37AM +1100, Denmat wrote: > Hi, > Puppet's sister project, MCollective would do it. An alternative would be > something like Rundeck. > > Den > > On 28/01/2012, at 3:52, Kyle Mallory <kyle.mall...@utah.edu> wrote: > > > I am experiencing a curious event, and wondering if others have seen > > this... As well, I have a question related to it. > > > > Today, I noticed my puppet summary report from Foreman this morning, that > > 60 of my 160 hosts all stopped reporting at nearly the exact same time, and > > have not since restarted. Investigating, it appears that my puppetmaster > > temporarily ran out of disk space on the /var volume, probably in part do > > to logging. I have log rollers running, which eventually freed up some > > disk space, but the 60 hosts, have not resumed reporting. > > > > If I dig into the logs on one of the failing agents, there are no messages > > from puppet, past 4am (here is a snippet of my logs): > > > > Jan 27 02:44:25 kmallory3 puppet-agent[15340]: Using cached catalog > > Jan 27 02:44:25 kmallory3 puppet-agent[15340]: Could not retrieve catalog; > > skipping run > > Jan 27 03:14:30 kmallory3 puppet-agent[15340]: Could not retrieve catalog > > from remote server: Error 400 on SERVER: No space left on device - > > /var/lib/puppet/yaml/facts/kmallory3.xxx.xxx.xxx.yaml > > Jan 27 03:14:30 kmallory3 puppet-agent[15340]: Using cached catalog > > Jan 27 03:14:30 kmallory3 puppet-agent[15340]: Could not retrieve catalog; > > skipping run > > Jan 27 03:47:30 kmallory3 puppet-agent[15340]: Could not retrieve plugin: > > execution expired > > Jan 27 04:01:02 kmallory3 puppet-agent[15340]: Could not retrieve catalog > > from remote server: execution expired > > Jan 27 04:01:02 kmallory3 puppet-agent[15340]: Using cached catalog > > Jan 27 04:01:02 kmallory3 puppet-agent[15340]: Could not retrieve catalog; > > skipping run > > > > Forcing a run of puppet, I get the following message: > > > > kmallory3:/var/log# puppetd --onetime --test > > notice: Ignoring --listen on onetime run > > notice: Run of Puppet configuration client already in progress; skipping > > > > After stopping and restarting the puppet service, the agent started running > > properly. It appears that the failure from the server has caused the agent > > to hang, from which it was not able to recover gracefully. Has anyone > > experienced this before? We are running 2.6.1 on the large majority of our > > hosts, including this one. Many failed, but 2/3rds keep running properly. > > > > Now, on to my question.. Anyone got some bright ideas for how I could force > > Puppet to restart itself on a 60 machines, when Puppet isn't running?? I'm > > not really excited by the prospect of logging into 60 machines, and running > > a sudo command... sigh. > > > > > > --Kyle > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Puppet Users" group. > > To post to this group, send email to puppet-users@googlegroups.com. > > To unsubscribe from this group, send email to > > puppet-users+unsubscr...@googlegroups.com. > > For more options, visit this group at > > http://groups.google.com/group/puppet-users?hl=en. > > > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.