Hi group,

We have a VM with 24 E7-8857 v2 @ 3.00GHz cores and 32G of ram (on big ESX 
hosts and fast backend) that is our foreman/puppetmaster with the following 
tuning params:

Passanger:
  PassengerMaxRequests 10000
  PassengerStatThrottleRate 180 
  PassengerMaxRequestQueueSize 300
  PassengerMaxPoolSize 18
  PassengerMinInstances 1
  PassengerHighPerformance on

PGSQL:
constraint_exclusion = on
checkpoint_completion_target = 0.9
checkpoint_segments = 16
max_connections = 100
maintenance_work_mem = 1GB
effective_cache_size = 22GB
work_mem = 192MB
wal_buffers = 8MB
shared_buffers = 7680MB

Apache
  StartServers        50
  MinSpareServers     5
  MaxSpareServers     20
  ServerLimit         256
  MaxClients          256
  MaxRequestsPerChild 4000


IPv6 disabled
vm.swappiness = 0
SELinux disabled
iptables flushed.

We have about 1400 hosts that checkin every 30 minutes and report facts. 
Facter execution time is less than 1 second on the nodes. 

The bottleneck seems to be 
Passenger RackApp: /etc/puppet/rack 

There is one of these for each passenger proc that sits at 100% all the 
time. A typical strace of it looks like this:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.17   42.235808        1320     31988     15964 futex
  3.17    1.393038           0   5722020           rt_sigprocmask
  0.51    0.225576          14     16157         3 select
  0.12    0.051727           1     93402     83142 stat
  0.01    0.006303           0     13092     13088 lstat
  0.01    0.003000        1500         2           fsync
...

Here are the versions of software we've moved through:
Master OS: Centos 6.5, 6.6
Foreman: 1.4.9, 1.5.1, 1.6.2
puppet: 3.5.1, 3.6.2, 3.7.2
Ruby: 1.8.7 (centos...)
Passenger: 4.0.18, 4.0.53

Settings we've tried in various combinations:
  PassengerMaxPoolSize 12, 18, 24
  PassengerMaxRequestQueueSize 150, 200, 250, 350
  PassengerStatThrottleRate 120, 180
  ServerLimit 256, 512
  MaxClients 256, 512

Requests in queue are always maxed out and a lot of nodes just timeout.

What am I missing? Our node count doesn't seem to be that big, our catalogs 
are fairly small too (basically just a bunch of autofs maps via module and 
2-3 files). 

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/159df117-0e8d-4a7c-99af-f1f029c393de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to