Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)

Brice Figureau Wed, 26 Jan 2011 08:23:24 -0800

On Wed, 2011-01-26 at 10:11 -0500, Micah Anderson wrote:
> Brice Figureau <brice-pup...@daysofwonder.com> writes:
> 
> > On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote:
> >> Brice Figureau <brice-pup...@daysofwonder.com> writes:
> >>
> >> All four of my mongrels are constantly pegged, doing 40-50% of the CPU
> >> each, occupying all available CPUs. They never settle down. I've got 74
> >> nodes checking in now, it doesn't seem like its that many, but perhaps
> >> i've reached a tipping point with my puppetmaster (its a dual 1ghz,
> >> 2gigs of ram machine)?
> >
> > The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you
> > shouldn't try to achieve a concurrency of 4 (which your mongrel are
> > trying to do), otherwise what will happen is that more than one request
> > will be accepted by one mongrel process and each thread will contend for
> > the CPU. The bad news is that the ruby MRI uses green threading, so the
> > second thread will only run when the first one will either sleep, do I/O
> > or relinquish the CPU voluntary. In a word, it will only run when the
> > first thread will finish its compilation.
> 
> Ok, that is a good thing to know. I wasn't aware that ruby was not able
> to do that.
> 
> > Now you have 74 nodes, with the worst compilation time of 75s (which is
> > a lot), that translates to 74*75 = 5550s of compilation time.
> > With a concurrency of 2, that's still 2775s of compilation time per
> > round of <insert here your default sleep time>. With the default 30min
> > of sleep time and assuming a perfect scheduling, that's still larger
> > than a round of sleep time, which means that you won't ever finish
> > compiling nodes, when the first node will ask again for a catalog.
> 
> I'm doing 60 minutes of sleep time, which is 3600 seconds an hour, the
> concurrency of 2 giving me 2775s of compile time per hour does keep me
> under the 3600 seconds... assuming scheduling is perfect, which it very
> likely is not.
> 
> > And I'm talking only about compilation. If your manifests use file
> > sourcing, you must also add this to the equation.
> 
> As explained, I set up your nginx method for offloading file sourcing.
> 
> > Another explanation of the issue is swapping. You mention your server
> > has 2GiB of RAM. Are you sure your 4 mongrel processes after some times
> > still fit in the physical RAM (along with the other thing running on the
> > server)?
> > Maybe your server is constantly swapping.
> 
> I'm actually doing fine on memory, not dipping into swap. I've watched
> i/o to see if I could identify either a swap or disk problem, but didn't
> notice very much happening there. The CPU usage of the mongrel processes
> is pretty much where everything is spending its time. 
> 
> I've been wondering if I have some loop in a manifest or something that
> is causing them to just spin.


I don't think it's the problem. There could be some ruby internals
issues playing here, but I doubt something in your manifest creates a
loop.

What is strange is that you mentioned that the very first catalog
compilations were fine, but then the compilation time increases.

> > So you can do several thing to get better performances:
> > * reduce the number of nodes that check in at a single time (ie increase
> > sleep time)
> 
> I've already reduced to once per hour, but I could consider reducing it
> more. 

That would be interesting. This would help us know if the problem is too
many load/concurrency for your clients or a problem in the master
itself.

BTW, what's the load on the server?

> > * reduce the time it takes to compile a catalog: 
> >   + which includes not using storeconfigs (or using puppetqd or
> > thin_storeconfig instead). 
> 
> I need to use storeconfigs, and as detailed in my original message, I've
> tried puppetqd and it didn't do much for me. thin_storeconfigs did help,
> and I'm still using it, so this one has already been done too.
> 
> >   + Check the server is not swapping. 
> 
> Not swapping.

OK, good.

> >   + Reduce the number of mongrel instances, to artifically reduce the
> > concurrency (this is counter-intuitive I know)
> 
> Ok, I'm backing off to two mongrels to see how well that works.

Let me know if that changes something.

> >   + use a "better" ruby interpreter like Ruby Enterprise Edition (for
> > several reasons this ones has better GC, better memory footprint).
> 
> I'm pretty sure my problem isn't memory, so I'm not sure if these will
> help much.

Well, having a better GC means that the ruby interpreter will become
faster at allocating stuff or recycling object. That in the end means
the overall memory footprint can be better, but that also means it will
spend much less time doing garbage stuff (ie better use the CPU for your
code and not for tidying stuff).

> >   + Cache compiled catalogs in nginx
> 
> Doing this.
> 
> >   + offload file content serving in nginx
> 
> Doing this
> 
> >   + Use passenger instead of mongrel
> 
> I tried to switch to passenger, and things were much worse. Actually,
> passenger worked fine with 0.25, but when I upgraded I couldn't get it
> to function anymore. I actually had to go back to nginx to get things
> functioning again.
> 
> >> 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10
> >> 
> >>    I didn't see any appreciable difference here. I ended up going back to
> >> 2.3.5 because that was the packaged version.
> >
> > Since you seem to use Debian, make sure you use either the latest ruby
> > lenny backports (or REE) as they fixed an issue with pthreads and CPU
> > consumption:
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229
> 
> I'm using Debian Squeeze, which has the same version you are mentioning
> from lenny backports (2.3.5).

I was talking about the ruby1.8 package, not rails. Make sure you use
the squeeze version or the lenny-backports one.

> >> 5. tried to cache catalogs through adding a http front-end cache and
> >> expiring that cache when manifests are updated[1] 
> >> 
> >>    I'm not sure this works at all.
> >
> > This should have helped because this would prevent the puppetmaster to
> > even be called. You might check your nginx configuration then.
> 
> Hmm. According to jamesturnbull, the rest terminus shouldn't allow you
> to request any node's catalog, so I'm not sure how this can work at
> all... but in case I've got something screwed up in my nginx.conf, I'd
> really be happy if you could have a look at it, its possible that I
> misunderstood something from your blog post! Here it is:

When a client asks for a catalog, nginx checks it has already cached it,
if it is and the cache is still fresh, it serves it otherwise it asks a
puppetmaster for the same REST url, and then cache what the master
returns.

It's easy to check if nginx is caching the catalog:
Have a look into /var/cache/nginx/cache and see if there are some files
containing some of your catalogs.

Puppet doesn't send the necessary caching headers right now, and I'm not
sure how nginx deals with that. I hope it would still cache (through the
vertue of proxy_cache_valid).

What version of nginx are you using?


>   server {
>     listen                        8140;
>     access_log                    /var/log/nginx/access.log noip;
>     ssl_verify_client             required;                         

Make that:
ssl_verify_client       optional;

And remove the second server{} block, and make sure your clients do not
use a different ca_port. But only if you use nginx >= 0.7.64

>     root                        /etc/puppet;
> 
>     # make sure we serve everything
>     # as raw
>     types { }
>     default_type                 application/x-raw;
> 
>     # serve static file for the [files] mountpoint
>     location /production/file_content/files/ {
>         allow                    172.16.0.0/16;
>         allow                    10.0.1.0/8;
>         allow                    127.0.0.1/8;
>         deny                     all;
> 
>         alias                    /etc/puppet/files/;
>     }
> 
>     # serve modules files sections
>     location ~ /production/file_content/[^/]+/files/ {
>         # it is advisable to have some access rules here
>         allow                    172.16.0.0/16;
>         allow                    10.0.1.0/8;
>         allow                    127.0.0.1/8;
>         deny                     all;
> 
>         root                     /etc/puppet/modules;
> 
>         # rewrite /production/file_content/module/files/file.txt
>         # to /module/file.text
>         rewrite                  
> ^/production/file_content/([^/]+)/files/(.+)$  $1/$2 break;
>     }
> 
>     # Variables
>     # $ssl_cipher returns the line of those utilized it is cipher for 
> established SSL-connection
>     # $ssl_client_serial returns the series number of client certificate for 
> established SSL-connection
>     # $ssl_client_s_dn returns line subject DN of client certificate for 
> established SSL-connection
>     # $ssl_client_i_dn returns line issuer DN of client certificate for 
> established SSL-connection
>     # $ssl_protocol returns the protocol of established SSL-connection
> 
>     location / {
>       proxy_pass                 http://puppet_mongrel;
>       proxy_redirect             off;
>       proxy_set_header           Host             $host;
>       proxy_set_header           X-Real-IP        $remote_addr;
>       proxy_set_header           X-Forwarded-For  $proxy_add_x_forwarded_for;
>       proxy_set_header           X-Client-Verify  SUCCESS;

If you used ssl_verify_client as I explained above, this should be:
proxy_set_header           X-Client-Verify   $ssl_client_verify

>       proxy_set_header           X-SSL-Subject    $ssl_client_s_dn;
>       proxy_set_header           X-SSL-Issuer     $ssl_client_i_dn;
>       proxy_buffer_size          16k;
>       proxy_buffers              8 32k;
>       proxy_busy_buffers_size    64k;
>       proxy_temp_file_write_size 64k;
>       proxy_read_timeout         540;
> 
>     # we handle catalog differently
>     # because we want to cache them
>     location /production/catalog {

Warning: this ^^ will work only if your nodes are in the "production"
environment. Adjust for your environments.


>         proxy_pass               http://puppet_mongrel;
>         proxy_redirect           off;
> 
>         # it is a good thing to actually restrict who
>         # can ask for a catalog (especially for cached
>         # catalogs)
>         allow                    172.16.0.0/16;
>         allow                    10.0.1.0/8;
>         allow                    127.0.0.1/8;
>         deny                     all;
> 
>         # where to cache contents
>         proxy_cache              puppetcache;
>     
>         # we cache content by catalog host
>         # we could also use $args to take into account request
>         # facts, but those change too often (ie uptime or memory)
>         # to be really usefull
>         proxy_cache_key          $uri;
> 
>         # define how long to cache response
>     
>         # normal catalogs will be cached 2 weeks
>         proxy_cache_valid        200 302 301 2w;
> 
>         # errors are not cached long
>         proxy_cache_valid        500 403 1m;
>     
>         # the rest is cached a little bit
>         proxy_cache_valid        any 30m;             
>     }
> 
>     # catch all location for other terminii
>     location / {

You already have a location '/' above.
Are you sure nginx is correctly using this configuration?
Try:
 nginx -t
it will check your configuration

>         proxy_pass               http://puppet_mongrel;
>         proxy_redirect           off;
>     }
>  }
> } 
>   server {
>     listen                       8141;
>     ssl_verify_client            off;
>     root                         /var/empty;
>     access_log                   /var/log/nginx/access.log noip;
> 
>     location / {
>       proxy_pass                 http://puppet_mongrel;
>       proxy_redirect             off;
>       proxy_set_header           Host             $host;
>       proxy_set_header           X-Real-IP        $remote_addr;
>       proxy_set_header           X-Forwarded-For  $proxy_add_x_forwarded_for;
>       proxy_set_header           X-Client-Verify  FAILURE;
>       proxy_set_header           X-SSL-Subject    $ssl_client_s_dn;
>       proxy_set_header           X-SSL-Issuer     $ssl_client_i_dn;
>     }
>   }
> }

This server{} wouldn't be needed if you use the ssl_verify_client as
explained above.

> 
> >> 7. set --http_compression
> >>    
> >>    I'm not sure if this actually hurts the master or not (because it has
> >>    to now occupy the CPU compressing catalogs?)
> >
> > This is a client option, and you need the collaboration of nginx for it
> > to work. This will certainly add more burden on your master CPU, because
> > nginx now has to gzip everything you're sending.
> 
> Yeah, I have the gzip compression turned on in nginx, but I dont really
> need it and my master could use the break.

Actually your nginx are only compressing text/plain documents, so it
won't compress your catalogs.

> >> 8. tried to follow the introspection technique[2] 
> >> 
> >>    this wasn't so easy to do, I had to operate really fast, because if I
> >>    was too slow the thread would exit, or it would get hung up on:
> >> 
> >> [Thread 0xb6194b70 (LWP 25770) exited]
> >> [New Thread 0xb6194b70 (LWP 25806)]
> >
> > When you attach gdb, how many threads are running?
> 
> I'm not sure, how can I determine that? I just had the existing 4
> mongrel processes.

Maybe you can first try to display the full C trace for all threads:
thread apply all bt

Then, resume everything, and 2 to 5s take another snapshot with the
command above. Comparing the two trace might help us understand what the
process is doing.

HTH,
-- 
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)

Reply via email to