Re: [Puppet Users] Puppetmaster can't keep up with our 1400 nodes.

Garrett Honeycutt Thu, 30 Oct 2014 12:06:01 -0700

On 10/30/14 10:45 AM, Georgi Todorov wrote:
> Hi group,
> 
> We have a VM with 24 E7-8857 v2 @ 3.00GHz cores and 32G of ram (on big
> ESX hosts and fast backend) that is our foreman/puppetmaster with the
> following tuning params:
> 
> Passanger:
>   PassengerMaxRequests 10000
>   PassengerStatThrottleRate 180 
>   PassengerMaxRequestQueueSize 300
>   PassengerMaxPoolSize 18
>   PassengerMinInstances 1
>   PassengerHighPerformance on
> 
> PGSQL:
> constraint_exclusion = on
> checkpoint_completion_target = 0.9
> checkpoint_segments = 16
> max_connections = 100
> maintenance_work_mem = 1GB
> effective_cache_size = 22GB
> work_mem = 192MB
> wal_buffers = 8MB
> shared_buffers = 7680MB
> 
> Apache
>   StartServers        50
>   MinSpareServers     5
>   MaxSpareServers     20
>   ServerLimit         256
>   MaxClients          256
>   MaxRequestsPerChild 4000
> 
> 
> IPv6 disabled
> vm.swappiness = 0
> SELinux disabled
> iptables flushed.
> 
> We have about 1400 hosts that checkin every 30 minutes and report facts.
> Facter execution time is less than 1 second on the nodes. 
> 
> The bottleneck seems to be 
> Passenger RackApp: /etc/puppet/rack 
> 
> There is one of these for each passenger proc that sits at 100% all the
> time. A typical strace of it looks like this:
> 
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  96.17   42.235808        1320     31988     15964 futex
>   3.17    1.393038           0   5722020           rt_sigprocmask
>   0.51    0.225576          14     16157         3 select
>   0.12    0.051727           1     93402     83142 stat
>   0.01    0.006303           0     13092     13088 lstat
>   0.01    0.003000        1500         2           fsync
> ...
> 
> Here are the versions of software we've moved through:
> Master OS: Centos 6.5, 6.6
> Foreman: 1.4.9, 1.5.1, 1.6.2
> puppet: 3.5.1, 3.6.2, 3.7.2
> Ruby: 1.8.7 (centos...)
> Passenger: 4.0.18, 4.0.53
> 
> Settings we've tried in various combinations:
>   PassengerMaxPoolSize 12, 18, 24
>   PassengerMaxRequestQueueSize 150, 200, 250, 350
>   PassengerStatThrottleRate 120, 180
>   ServerLimit 256, 512
>   MaxClients 256, 512
> 
> Requests in queue are always maxed out and a lot of nodes just timeout.
> 
> What am I missing? Our node count doesn't seem to be that big, our
> catalogs are fairly small too (basically just a bunch of autofs maps via
> module and 2-3 files). 
> 
> Thanks!
>


Hi Georgi,

How long does it take to compile a catalog? Is your VM server over
subscribed? Here's the formula for figuring out how many cores you need
dedicated to compiling catalogs. Note this is *dedicated* to compiling,
so minus two for the OS, if you run Dashboard minus the number of
workers, if you are running PuppetDB and Postgres, minus a few more.

Take a look at my post[1] to ask.puppetlabs.com regarding sizing.

cores = (nodes) * (check-ins per hour) * (seconds per catalog) /
(seconds per hour)

Another way to look at this is how many nodes should the current
hardware support.

nodes = (cores) * (seconds per hour) / (check-ins per hour) / (seconds
per catalog)


[1] -
http://ask.puppetlabs.com/question/3/where-can-i-find-information-about-sizing-for-puppet-servers/?answer=101#post-id-101

Best regards,
-g

-- 
Garrett Honeycutt
@learnpuppet
Puppet Training with LearnPuppet.com
Mobile: +1.206.414.8658

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/54528BD4.5010800%40garretthoneycutt.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Puppetmaster can't keep up with our 1400 nodes.

Reply via email to