On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote:
>
>  Actually, sometime last night something happened and puppet stopped 
> processing requests altogether. Stopping and starting httpd fixed this, but 
> this could be just some bug in one of the new versions of software I 
> upgraded to. I'll keep monitoring.
>

So, unfortunately issue is not fixed :(. For whatever reason, everything 
ran great for a day. Catalog compiles were taking around 7 seconds, client 
runs finished in about 20s - happy days. Then overnight, the catalog 
compile times jumped to 20-30 seconds and client runs were now taking 200+ 
seconds. Few hours later, and there would be no more requests arriving at 
the puppet master at all. Is my http server flaking out? 

Running some --trace --evaltrace and strace it looks like most of the time 
is spent stat-ing:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 83.01    5.743474           9    673606    612864 stat
  7.72    0.534393           7     72102     71510 lstat
  6.76    0.467930       77988         6           wait4

That's a pretty poor "hit" rate (7k out of 74k stats)...

I've increased the check time to 1 hour on all clients, and the master 
seems to be keeping up for now - catalog compile avg 8 seconds, client run 
avg - 15 seconds, queue size = 0;

 Here is what a client run looks like when the server is keeping up:

Notice: Finished catalog run in *11.93* seconds
Changes:
Events:
Resources:
            Total: 522
Time:
       Filebucket: 0.00
             Cron: 0.00
         Schedule: 0.00
          Package: 0.00
          Service: 0.68
             Exec: 1.07
             *File: 1.72*
   Config retrieval: 13.35
         Last run: 1415032387
            Total: 16.82
Version:
           Config: 1415031292
           Puppet: 3.7.2


And when the server is just about dead:
Notice: Finished catalog run in 214.21 seconds
Changes:
Events:
Resources:
            Total: 522
Time:
             Cron: 0.00
       Filebucket: 0.00
         Schedule: 0.01
          Package: 0.02
          Service: 1.19
             File: 128.94
         Last run: 1415027092
            Total: 159.21
             Exec: 2.25
   Config retrieval: 26.80
Version:
           Config: 1415025705
           Puppet: 3.7.2


Probably 500 of the "Resources" are autofs maps 
using https://github.com/pdxcat/puppet-module-autofs/commits/master 

So there is definitely some bottle neck on the system, the problem is I 
can't figure out what it is. Is disk IO (iostat doesn't seem to think so), 
is it CPU (top looks fine), is it memory (ditto), is http/passenger combo 
not up to the task, is the postgres server not keeping up? There are so 
many components that it is hard for me to do a proper profile to find where 
the bottleneck is. Any ideas?

So far I've timed  the ENC script that pulls the classes for a node - takes 
less than 1 second. 
>From messages the catalog compile is from 7 seconds to 25 seconds (worst 
case, overloaded server). 

Anyway, figured I'd share that, unfortunately ruby was not the issue. Back 
to poking around and testing.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/3549a96b-74e2-4335-90a9-3aa6f8f74699%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to