On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote: > > Actually, sometime last night something happened and puppet stopped > processing requests altogether. Stopping and starting httpd fixed this, but > this could be just some bug in one of the new versions of software I > upgraded to. I'll keep monitoring. >
So, unfortunately issue is not fixed :(. For whatever reason, everything ran great for a day. Catalog compiles were taking around 7 seconds, client runs finished in about 20s - happy days. Then overnight, the catalog compile times jumped to 20-30 seconds and client runs were now taking 200+ seconds. Few hours later, and there would be no more requests arriving at the puppet master at all. Is my http server flaking out? Running some --trace --evaltrace and strace it looks like most of the time is spent stat-ing: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 83.01 5.743474 9 673606 612864 stat 7.72 0.534393 7 72102 71510 lstat 6.76 0.467930 77988 6 wait4 That's a pretty poor "hit" rate (7k out of 74k stats)... I've increased the check time to 1 hour on all clients, and the master seems to be keeping up for now - catalog compile avg 8 seconds, client run avg - 15 seconds, queue size = 0; Here is what a client run looks like when the server is keeping up: Notice: Finished catalog run in *11.93* seconds Changes: Events: Resources: Total: 522 Time: Filebucket: 0.00 Cron: 0.00 Schedule: 0.00 Package: 0.00 Service: 0.68 Exec: 1.07 *File: 1.72* Config retrieval: 13.35 Last run: 1415032387 Total: 16.82 Version: Config: 1415031292 Puppet: 3.7.2 And when the server is just about dead: Notice: Finished catalog run in 214.21 seconds Changes: Events: Resources: Total: 522 Time: Cron: 0.00 Filebucket: 0.00 Schedule: 0.01 Package: 0.02 Service: 1.19 File: 128.94 Last run: 1415027092 Total: 159.21 Exec: 2.25 Config retrieval: 26.80 Version: Config: 1415025705 Puppet: 3.7.2 Probably 500 of the "Resources" are autofs maps using https://github.com/pdxcat/puppet-module-autofs/commits/master So there is definitely some bottle neck on the system, the problem is I can't figure out what it is. Is disk IO (iostat doesn't seem to think so), is it CPU (top looks fine), is it memory (ditto), is http/passenger combo not up to the task, is the postgres server not keeping up? There are so many components that it is hard for me to do a proper profile to find where the bottleneck is. Any ideas? So far I've timed the ENC script that pulls the classes for a node - takes less than 1 second. >From messages the catalog compile is from 7 seconds to 25 seconds (worst case, overloaded server). Anyway, figured I'd share that, unfortunately ruby was not the issue. Back to poking around and testing. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/3549a96b-74e2-4335-90a9-3aa6f8f74699%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.