On Monday, November 3, 2014 10:39:14 AM UTC-6, Georgi Todorov wrote: > > On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote: >> >> Actually, sometime last night something happened and puppet stopped >> processing requests altogether. Stopping and starting httpd fixed this, but >> this could be just some bug in one of the new versions of software I >> upgraded to. I'll keep monitoring. >> > > So, unfortunately issue is not fixed :(. For whatever reason, everything > ran great for a day. Catalog compiles were taking around 7 seconds, client > runs finished in about 20s - happy days. Then overnight, the catalog > compile times jumped to 20-30 seconds and client runs were now taking 200+ > seconds. Few hours later, and there would be no more requests arriving at > the puppet master at all. Is my http server flaking out? > > Running some --trace --evaltrace and strace it looks like most of the time > is spent stat-ing: > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 83.01 5.743474 9 673606 612864 stat > 7.72 0.534393 7 72102 71510 lstat > 6.76 0.467930 77988 6 wait4 > > That's a pretty poor "hit" rate (7k out of 74k stats)... > > I've increased the check time to 1 hour on all clients, and the master > seems to be keeping up for now - catalog compile avg 8 seconds, client run > avg - 15 seconds, queue size = 0; > > Here is what a client run looks like when the server is keeping up: > > Notice: Finished catalog run in *11.93* seconds > [...]
> *File: 1.72* > Config retrieval: 13.35 > [...] > > And when the server is just about dead: > [...] > File: 128.94 > [...] > Config retrieval: 26.80 > [...] > Probably 500 of the "Resources" are autofs maps using > https://github.com/pdxcat/puppet-module-autofs/commits/master > > So there is definitely some bottle neck on the system, the problem is I > can't figure out what it is. Is disk IO (iostat doesn't seem to think so), > is it CPU (top looks fine), is it memory (ditto), is http/passenger combo > not up to the task, is the postgres server not keeping up? There are so > many components that it is hard for me to do a proper profile to find where > the bottleneck is. Any ideas? > > So far I've timed the ENC script that pulls the classes for a node - > takes less than 1 second. > From messages the catalog compile is from 7 seconds to 25 seconds (worst > case, overloaded server). > > Anyway, figured I'd share that, unfortunately ruby was not the issue. Back > to poking around and testing. > By far the biggest difference is is File retrieval time. This will be for File resources where you specify content via a 'source' parameter rather than via a 'content' property. The agent must make a separate request to the master for each such file, and those are collectively taking a long time. Most likely they are backing up behind a bottleneck, so that much of the time consumed per node is actually spent waiting for service. If the CPU is not overloaded and you have free physical RAM then It seems to me that the system service (i.e. httpd) and the I/O subsystem are your remaining candidates for the locus of the issue. As you attempt to identify the bottleneck, do not ignore the total number of transactions serviced. That could be as important as -- or even more important than -- the volume of data exchanged. If the problem turns out to be related to the number / rate of transactions handled by httpd, then you could consider addressing it by switching File resources from using 'source' to using 'content' to specify file content. That's a pretty clear win performance-wise for very small files, and it may be a win for you for somewhat larger files as well. (Yes, I'm being vague. Any hard numbers I threw out for "large" and "small" would be made up.) John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/9f19677e-73eb-4fd3-b2d0-612332ea9472%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.