On Monday, November 3, 2014 10:39:14 AM UTC-6, Georgi Todorov wrote:
>
> On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote:
>>
>>  Actually, sometime last night something happened and puppet stopped 
>> processing requests altogether. Stopping and starting httpd fixed this, but 
>> this could be just some bug in one of the new versions of software I 
>> upgraded to. I'll keep monitoring.
>>
>
> So, unfortunately issue is not fixed :(. For whatever reason, everything 
> ran great for a day. Catalog compiles were taking around 7 seconds, client 
> runs finished in about 20s - happy days. Then overnight, the catalog 
> compile times jumped to 20-30 seconds and client runs were now taking 200+ 
> seconds. Few hours later, and there would be no more requests arriving at 
> the puppet master at all. Is my http server flaking out? 
>
> Running some --trace --evaltrace and strace it looks like most of the time 
> is spent stat-ing:
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  83.01    5.743474           9    673606    612864 stat
>   7.72    0.534393           7     72102     71510 lstat
>   6.76    0.467930       77988         6           wait4
>
> That's a pretty poor "hit" rate (7k out of 74k stats)...
>
> I've increased the check time to 1 hour on all clients, and the master 
> seems to be keeping up for now - catalog compile avg 8 seconds, client run 
> avg - 15 seconds, queue size = 0;
>
>  Here is what a client run looks like when the server is keeping up:
>
> Notice: Finished catalog run in *11.93* seconds
>
[...] 

>              *File: 1.72*
>    Config retrieval: 13.35
>
[...]

>
> And when the server is just about dead:
>
[...] 

>              File: 128.94
>
[...] 

>    Config retrieval: 26.80
>

 [...]
 

> Probably 500 of the "Resources" are autofs maps using 
> https://github.com/pdxcat/puppet-module-autofs/commits/master 
>
> So there is definitely some bottle neck on the system, the problem is I 
> can't figure out what it is. Is disk IO (iostat doesn't seem to think so), 
> is it CPU (top looks fine), is it memory (ditto), is http/passenger combo 
> not up to the task, is the postgres server not keeping up? There are so 
> many components that it is hard for me to do a proper profile to find where 
> the bottleneck is. Any ideas?
>
> So far I've timed  the ENC script that pulls the classes for a node - 
> takes less than 1 second. 
> From messages the catalog compile is from 7 seconds to 25 seconds (worst 
> case, overloaded server). 
>
> Anyway, figured I'd share that, unfortunately ruby was not the issue. Back 
> to poking around and testing.
>


By far the biggest difference is is File retrieval time.  This will be for 
File resources where you specify content via a 'source' parameter rather 
than via a 'content' property.  The agent must make a separate request to 
the master for each such file, and those are collectively taking a long 
time.  Most likely they are backing up behind a bottleneck, so that much of 
the time consumed per node is actually spent waiting for service.

If the CPU is not overloaded and you have free physical RAM then It seems 
to me that the system service (i.e. httpd) and the I/O subsystem are your 
remaining candidates for the locus of the issue.  As you attempt to 
identify the bottleneck, do not ignore the total number of transactions 
serviced.  That could be as important as -- or even more important than -- 
the volume of data exchanged.

If the problem turns out to be related to the number / rate of transactions 
handled by httpd, then you could consider addressing it by switching File 
resources from using 'source' to using 'content' to specify file content.  
That's a pretty clear win performance-wise for very small files, and it may 
be a win for you for somewhat larger files as well.  (Yes, I'm being 
vague.  Any hard numbers I threw out for "large" and "small" would be made 
up.)


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/9f19677e-73eb-4fd3-b2d0-612332ea9472%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to