Brice Figureau <brice-pup...@daysofwonder.com> writes: > On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote: >> Brice Figureau <brice-pup...@daysofwonder.com> writes: >> >> All four of my mongrels are constantly pegged, doing 40-50% of the CPU >> each, occupying all available CPUs. They never settle down. I've got 74 >> nodes checking in now, it doesn't seem like its that many, but perhaps >> i've reached a tipping point with my puppetmaster (its a dual 1ghz, >> 2gigs of ram machine)? > > The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you > shouldn't try to achieve a concurrency of 4 (which your mongrel are > trying to do), otherwise what will happen is that more than one request > will be accepted by one mongrel process and each thread will contend for > the CPU. The bad news is that the ruby MRI uses green threading, so the > second thread will only run when the first one will either sleep, do I/O > or relinquish the CPU voluntary. In a word, it will only run when the > first thread will finish its compilation.
Ok, that is a good thing to know. I wasn't aware that ruby was not able to do that. > Now you have 74 nodes, with the worst compilation time of 75s (which is > a lot), that translates to 74*75 = 5550s of compilation time. > With a concurrency of 2, that's still 2775s of compilation time per > round of <insert here your default sleep time>. With the default 30min > of sleep time and assuming a perfect scheduling, that's still larger > than a round of sleep time, which means that you won't ever finish > compiling nodes, when the first node will ask again for a catalog. I'm doing 60 minutes of sleep time, which is 3600 seconds an hour, the concurrency of 2 giving me 2775s of compile time per hour does keep me under the 3600 seconds... assuming scheduling is perfect, which it very likely is not. > And I'm talking only about compilation. If your manifests use file > sourcing, you must also add this to the equation. As explained, I set up your nginx method for offloading file sourcing. > Another explanation of the issue is swapping. You mention your server > has 2GiB of RAM. Are you sure your 4 mongrel processes after some times > still fit in the physical RAM (along with the other thing running on the > server)? > Maybe your server is constantly swapping. I'm actually doing fine on memory, not dipping into swap. I've watched i/o to see if I could identify either a swap or disk problem, but didn't notice very much happening there. The CPU usage of the mongrel processes is pretty much where everything is spending its time. I've been wondering if I have some loop in a manifest or something that is causing them to just spin. > So you can do several thing to get better performances: > * reduce the number of nodes that check in at a single time (ie increase > sleep time) I've already reduced to once per hour, but I could consider reducing it more. > * reduce the time it takes to compile a catalog: > + which includes not using storeconfigs (or using puppetqd or > thin_storeconfig instead). I need to use storeconfigs, and as detailed in my original message, I've tried puppetqd and it didn't do much for me. thin_storeconfigs did help, and I'm still using it, so this one has already been done too. > + Check the server is not swapping. Not swapping. > + Reduce the number of mongrel instances, to artifically reduce the > concurrency (this is counter-intuitive I know) Ok, I'm backing off to two mongrels to see how well that works. > + use a "better" ruby interpreter like Ruby Enterprise Edition (for > several reasons this ones has better GC, better memory footprint). I'm pretty sure my problem isn't memory, so I'm not sure if these will help much. > + Cache compiled catalogs in nginx Doing this. > + offload file content serving in nginx Doing this > + Use passenger instead of mongrel I tried to switch to passenger, and things were much worse. Actually, passenger worked fine with 0.25, but when I upgraded I couldn't get it to function anymore. I actually had to go back to nginx to get things functioning again. >> 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10 >> >> I didn't see any appreciable difference here. I ended up going back to >> 2.3.5 because that was the packaged version. > > Since you seem to use Debian, make sure you use either the latest ruby > lenny backports (or REE) as they fixed an issue with pthreads and CPU > consumption: > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229 I'm using Debian Squeeze, which has the same version you are mentioning from lenny backports (2.3.5). >> 5. tried to cache catalogs through adding a http front-end cache and >> expiring that cache when manifests are updated[1] >> >> I'm not sure this works at all. > > This should have helped because this would prevent the puppetmaster to > even be called. You might check your nginx configuration then. Hmm. According to jamesturnbull, the rest terminus shouldn't allow you to request any node's catalog, so I'm not sure how this can work at all... but in case I've got something screwed up in my nginx.conf, I'd really be happy if you could have a look at it, its possible that I misunderstood something from your blog post! Here it is: user www-data; worker_processes 2; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { # In a reverse proxy situation, max_clients becomes # max_clients = worker_processes * worker_connections/4 worker_connections 2048; } http { default_type application/octet-stream; sendfile on; tcp_nopush on; tcp_nodelay on; large_client_header_buffers 1024 2048k; client_max_body_size 150m; proxy_buffers 128 4k; keepalive_timeout 65; gzip on; gzip_min_length 1000; gzip_types text/plain; ssl on; ssl_certificate /var/lib/puppet/ssl/certs/puppetmaster.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/puppetmaster.pem; ssl_client_certificate /var/lib/puppet/ssl/ca/ca_crt.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_session_cache shared:SSL:8m; ssl_session_timeout 5m; proxy_read_timeout 600; upstream puppet_mongrel { fair; server 127.0.0.1:18140; server 127.0.0.1:18141; server 127.0.0.1:18142; server 127.0.0.1:18143; } log_format noip '0.0.0.0 - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"'; proxy_cache_path /var/cache/nginx/cache levels=1:2 keys_zone=puppetcache:10m; server { listen 8140; access_log /var/log/nginx/access.log noip; ssl_verify_client required; root /etc/puppet; # make sure we serve everything # as raw types { } default_type application/x-raw; # serve static file for the [files] mountpoint location /production/file_content/files/ { allow 172.16.0.0/16; allow 10.0.1.0/8; allow 127.0.0.1/8; deny all; alias /etc/puppet/files/; } # serve modules files sections location ~ /production/file_content/[^/]+/files/ { # it is advisable to have some access rules here allow 172.16.0.0/16; allow 10.0.1.0/8; allow 127.0.0.1/8; deny all; root /etc/puppet/modules; # rewrite /production/file_content/module/files/file.txt # to /module/file.text rewrite ^/production/file_content/([^/]+)/files/(.+)$ $1/$2 break; } # Variables # $ssl_cipher returns the line of those utilized it is cipher for established SSL-connection # $ssl_client_serial returns the series number of client certificate for established SSL-connection # $ssl_client_s_dn returns line subject DN of client certificate for established SSL-connection # $ssl_client_i_dn returns line issuer DN of client certificate for established SSL-connection # $ssl_protocol returns the protocol of established SSL-connection location / { proxy_pass http://puppet_mongrel; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Client-Verify SUCCESS; proxy_set_header X-SSL-Subject $ssl_client_s_dn; proxy_set_header X-SSL-Issuer $ssl_client_i_dn; proxy_buffer_size 16k; proxy_buffers 8 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; proxy_read_timeout 540; # we handle catalog differently # because we want to cache them location /production/catalog { proxy_pass http://puppet_mongrel; proxy_redirect off; # it is a good thing to actually restrict who # can ask for a catalog (especially for cached # catalogs) allow 172.16.0.0/16; allow 10.0.1.0/8; allow 127.0.0.1/8; deny all; # where to cache contents proxy_cache puppetcache; # we cache content by catalog host # we could also use $args to take into account request # facts, but those change too often (ie uptime or memory) # to be really usefull proxy_cache_key $uri; # define how long to cache response # normal catalogs will be cached 2 weeks proxy_cache_valid 200 302 301 2w; # errors are not cached long proxy_cache_valid 500 403 1m; # the rest is cached a little bit proxy_cache_valid any 30m; } # catch all location for other terminii location / { proxy_pass http://puppet_mongrel; proxy_redirect off; } } } server { listen 8141; ssl_verify_client off; root /var/empty; access_log /var/log/nginx/access.log noip; location / { proxy_pass http://puppet_mongrel; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Client-Verify FAILURE; proxy_set_header X-SSL-Subject $ssl_client_s_dn; proxy_set_header X-SSL-Issuer $ssl_client_i_dn; } } } >> 7. set --http_compression >> >> I'm not sure if this actually hurts the master or not (because it has >> to now occupy the CPU compressing catalogs?) > > This is a client option, and you need the collaboration of nginx for it > to work. This will certainly add more burden on your master CPU, because > nginx now has to gzip everything you're sending. Yeah, I have the gzip compression turned on in nginx, but I dont really need it and my master could use the break. >> 8. tried to follow the introspection technique[2] >> >> this wasn't so easy to do, I had to operate really fast, because if I >> was too slow the thread would exit, or it would get hung up on: >> >> [Thread 0xb6194b70 (LWP 25770) exited] >> [New Thread 0xb6194b70 (LWP 25806)] > > When you attach gdb, how many threads are running? I'm not sure, how can I determine that? I just had the existing 4 mongrel processes. >> (gdb) eval "total = \[\[ObjectSpace\]\].each_object(Array)\{\|x\| puts >> '---'; puts x.inspect \}; puts \\"---\\nTotal Arrays: \#{total}\\"" >> Invalid character '\' in expression. The above seemed to be a problem with the expression on the wiki page, does anyone know what that should be so gdb doesn't have a problem with it? >> I'm available on IRC to try more advanced debugging, just ping me >> (hacim). I'd really like things to function again! > > I'll ping you, but I'm just really busy for this very next couple of > days :( Thanks for any help or ideas, I'm out of ideas myself so anything helps! micah -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.