Brice Figureau <brice-pup...@daysofwonder.com> writes:

> On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote:
>> Brice Figureau <brice-pup...@daysofwonder.com> writes:
>>
>> All four of my mongrels are constantly pegged, doing 40-50% of the CPU
>> each, occupying all available CPUs. They never settle down. I've got 74
>> nodes checking in now, it doesn't seem like its that many, but perhaps
>> i've reached a tipping point with my puppetmaster (its a dual 1ghz,
>> 2gigs of ram machine)?
>
> The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you
> shouldn't try to achieve a concurrency of 4 (which your mongrel are
> trying to do), otherwise what will happen is that more than one request
> will be accepted by one mongrel process and each thread will contend for
> the CPU. The bad news is that the ruby MRI uses green threading, so the
> second thread will only run when the first one will either sleep, do I/O
> or relinquish the CPU voluntary. In a word, it will only run when the
> first thread will finish its compilation.

Ok, that is a good thing to know. I wasn't aware that ruby was not able
to do that.

> Now you have 74 nodes, with the worst compilation time of 75s (which is
> a lot), that translates to 74*75 = 5550s of compilation time.
> With a concurrency of 2, that's still 2775s of compilation time per
> round of <insert here your default sleep time>. With the default 30min
> of sleep time and assuming a perfect scheduling, that's still larger
> than a round of sleep time, which means that you won't ever finish
> compiling nodes, when the first node will ask again for a catalog.

I'm doing 60 minutes of sleep time, which is 3600 seconds an hour, the
concurrency of 2 giving me 2775s of compile time per hour does keep me
under the 3600 seconds... assuming scheduling is perfect, which it very
likely is not.

> And I'm talking only about compilation. If your manifests use file
> sourcing, you must also add this to the equation.

As explained, I set up your nginx method for offloading file sourcing.

> Another explanation of the issue is swapping. You mention your server
> has 2GiB of RAM. Are you sure your 4 mongrel processes after some times
> still fit in the physical RAM (along with the other thing running on the
> server)?
> Maybe your server is constantly swapping.

I'm actually doing fine on memory, not dipping into swap. I've watched
i/o to see if I could identify either a swap or disk problem, but didn't
notice very much happening there. The CPU usage of the mongrel processes
is pretty much where everything is spending its time. 

I've been wondering if I have some loop in a manifest or something that
is causing them to just spin.

> So you can do several thing to get better performances:
> * reduce the number of nodes that check in at a single time (ie increase
> sleep time)

I've already reduced to once per hour, but I could consider reducing it
more. 

> * reduce the time it takes to compile a catalog: 
>   + which includes not using storeconfigs (or using puppetqd or
> thin_storeconfig instead). 

I need to use storeconfigs, and as detailed in my original message, I've
tried puppetqd and it didn't do much for me. thin_storeconfigs did help,
and I'm still using it, so this one has already been done too.

>   + Check the server is not swapping. 

Not swapping.

>   + Reduce the number of mongrel instances, to artifically reduce the
> concurrency (this is counter-intuitive I know)

Ok, I'm backing off to two mongrels to see how well that works.

>   + use a "better" ruby interpreter like Ruby Enterprise Edition (for
> several reasons this ones has better GC, better memory footprint).

I'm pretty sure my problem isn't memory, so I'm not sure if these will
help much.

>   + Cache compiled catalogs in nginx

Doing this.

>   + offload file content serving in nginx

Doing this

>   + Use passenger instead of mongrel

I tried to switch to passenger, and things were much worse. Actually,
passenger worked fine with 0.25, but when I upgraded I couldn't get it
to function anymore. I actually had to go back to nginx to get things
functioning again.

>> 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10
>> 
>>    I didn't see any appreciable difference here. I ended up going back to
>> 2.3.5 because that was the packaged version.
>
> Since you seem to use Debian, make sure you use either the latest ruby
> lenny backports (or REE) as they fixed an issue with pthreads and CPU
> consumption:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229

I'm using Debian Squeeze, which has the same version you are mentioning
from lenny backports (2.3.5).

>> 5. tried to cache catalogs through adding a http front-end cache and
>> expiring that cache when manifests are updated[1] 
>> 
>>    I'm not sure this works at all.
>
> This should have helped because this would prevent the puppetmaster to
> even be called. You might check your nginx configuration then.

Hmm. According to jamesturnbull, the rest terminus shouldn't allow you
to request any node's catalog, so I'm not sure how this can work at
all... but in case I've got something screwed up in my nginx.conf, I'd
really be happy if you could have a look at it, its possible that I
misunderstood something from your blog post! Here it is:

user                              www-data;
worker_processes                  2;

error_log                         /var/log/nginx/error.log;
pid                               /var/run/nginx.pid;

events {
  # In a reverse proxy situation, max_clients becomes
  # max_clients = worker_processes * worker_connections/4
  worker_connections              2048;
}

http {
  default_type                    application/octet-stream;

  sendfile                        on;
  tcp_nopush                      on;
  tcp_nodelay                     on;

  large_client_header_buffers     1024      2048k;
  client_max_body_size            150m;
  proxy_buffers                   128     4k;
  
  keepalive_timeout               65;
  
  gzip                            on;
  gzip_min_length                 1000;
  gzip_types                      text/plain;

  ssl                             on;
  ssl_certificate                 /var/lib/puppet/ssl/certs/puppetmaster.pem;
  ssl_certificate_key             
/var/lib/puppet/ssl/private_keys/puppetmaster.pem;
  ssl_client_certificate          /var/lib/puppet/ssl/ca/ca_crt.pem;
  ssl_ciphers                     SSLv2:-LOW:-EXPORT:RC4+RSA;
  ssl_session_cache               shared:SSL:8m;
  ssl_session_timeout             5m;
  
  proxy_read_timeout              600;
  upstream puppet_mongrel {
    fair;
    server                        127.0.0.1:18140;
    server                        127.0.0.1:18141;
    server                        127.0.0.1:18142;
    server                        127.0.0.1:18143;
  }
  log_format  noip  '0.0.0.0 - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent"';

  proxy_cache_path  /var/cache/nginx/cache  levels=1:2   
keys_zone=puppetcache:10m;

  server {
    listen                        8140;
    access_log                    /var/log/nginx/access.log noip;
    ssl_verify_client             required;                           

    root                          /etc/puppet;

    # make sure we serve everything
    # as raw
    types { }
    default_type                 application/x-raw;

    # serve static file for the [files] mountpoint
    location /production/file_content/files/ {
        allow                    172.16.0.0/16;
        allow                    10.0.1.0/8;
        allow                    127.0.0.1/8;
        deny                     all;

        alias                    /etc/puppet/files/;
    }

    # serve modules files sections
    location ~ /production/file_content/[^/]+/files/ {
        # it is advisable to have some access rules here
        allow                    172.16.0.0/16;
        allow                    10.0.1.0/8;
        allow                    127.0.0.1/8;
        deny                     all;

        root                     /etc/puppet/modules;

        # rewrite /production/file_content/module/files/file.txt
        # to /module/file.text
        rewrite                  ^/production/file_content/([^/]+)/files/(.+)$  
$1/$2 break;
    }

    # Variables
    # $ssl_cipher returns the line of those utilized it is cipher for 
established SSL-connection
    # $ssl_client_serial returns the series number of client certificate for 
established SSL-connection
    # $ssl_client_s_dn returns line subject DN of client certificate for 
established SSL-connection
    # $ssl_client_i_dn returns line issuer DN of client certificate for 
established SSL-connection
    # $ssl_protocol returns the protocol of established SSL-connection

    location / {
      proxy_pass                 http://puppet_mongrel;
      proxy_redirect             off;
      proxy_set_header           Host             $host;
      proxy_set_header           X-Real-IP        $remote_addr;
      proxy_set_header           X-Forwarded-For  $proxy_add_x_forwarded_for;
      proxy_set_header           X-Client-Verify  SUCCESS;
      proxy_set_header           X-SSL-Subject    $ssl_client_s_dn;
      proxy_set_header           X-SSL-Issuer     $ssl_client_i_dn;
      proxy_buffer_size          16k;
      proxy_buffers              8 32k;
      proxy_busy_buffers_size    64k;
      proxy_temp_file_write_size 64k;
      proxy_read_timeout         540;

    # we handle catalog differently
    # because we want to cache them
    location /production/catalog {
        proxy_pass               http://puppet_mongrel;
        proxy_redirect           off;

        # it is a good thing to actually restrict who
        # can ask for a catalog (especially for cached
        # catalogs)
        allow                    172.16.0.0/16;
        allow                    10.0.1.0/8;
        allow                    127.0.0.1/8;
        deny                     all;

        # where to cache contents
        proxy_cache              puppetcache;
    
        # we cache content by catalog host
        # we could also use $args to take into account request
        # facts, but those change too often (ie uptime or memory)
        # to be really usefull
        proxy_cache_key          $uri;

        # define how long to cache response
    
        # normal catalogs will be cached 2 weeks
        proxy_cache_valid        200 302 301 2w;

        # errors are not cached long
        proxy_cache_valid        500 403 1m;
    
        # the rest is cached a little bit
        proxy_cache_valid        any 30m;             
    }

    # catch all location for other terminii
    location / {
        proxy_pass               http://puppet_mongrel;
        proxy_redirect           off;
    }
 }
} 
  server {
    listen                       8141;
    ssl_verify_client            off;
    root                         /var/empty;
    access_log                   /var/log/nginx/access.log noip;

    location / {
      proxy_pass                 http://puppet_mongrel;
      proxy_redirect             off;
      proxy_set_header           Host             $host;
      proxy_set_header           X-Real-IP        $remote_addr;
      proxy_set_header           X-Forwarded-For  $proxy_add_x_forwarded_for;
      proxy_set_header           X-Client-Verify  FAILURE;
      proxy_set_header           X-SSL-Subject    $ssl_client_s_dn;
      proxy_set_header           X-SSL-Issuer     $ssl_client_i_dn;
    }
  }
}


>> 7. set --http_compression
>>    
>>    I'm not sure if this actually hurts the master or not (because it has
>>    to now occupy the CPU compressing catalogs?)
>
> This is a client option, and you need the collaboration of nginx for it
> to work. This will certainly add more burden on your master CPU, because
> nginx now has to gzip everything you're sending.

Yeah, I have the gzip compression turned on in nginx, but I dont really
need it and my master could use the break.

>> 8. tried to follow the introspection technique[2] 
>> 
>>    this wasn't so easy to do, I had to operate really fast, because if I
>>    was too slow the thread would exit, or it would get hung up on:
>> 
>> [Thread 0xb6194b70 (LWP 25770) exited]
>> [New Thread 0xb6194b70 (LWP 25806)]
>
> When you attach gdb, how many threads are running?

I'm not sure, how can I determine that? I just had the existing 4
mongrel processes.


>> (gdb) eval "total = \[\[ObjectSpace\]\].each_object(Array)\{\|x\| puts 
>> '---'; puts x.inspect \}; puts \\"---\\nTotal Arrays: \#{total}\\""
>> Invalid character '\' in expression.

The above seemed to be a problem with the expression on the wiki page,
does anyone know what that should be so gdb doesn't have a problem with
it?

>> I'm available on IRC to try more advanced debugging, just ping me
>> (hacim). I'd really like things to function again!
>
> I'll ping you, but I'm just really busy for this very next couple of
> days :(

Thanks for any help or ideas, I'm out of ideas myself so anything helps!

micah

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to