On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote: > Brice Figureau <brice-pup...@daysofwonder.com> writes: > > > On 15/12/10 19:27, Ashley Penney wrote: > >> This issue is definitely a problem. I have a support ticket in with > >> Puppet Labs about the same thing. My CPU remains at 100% almost > >> constantly and it slows things down significantly. If you strace it you > >> can see that very little appears to be going on. This is absolutely not > >> normal behavior. Even when I had 1 client checking in I had all cores > >> fully used. > > > > I do agree that it's not the correct behavior. I suggest you to strace > > or use any other ruby introspection techniques to find what part of the > > master is taking CPU. > > I'm having a similar problem with 2.6.3. At this point, I can't get > reliable puppet runs, and I'm not sure what to do. > > What seems to happen is things are working fine at the > beginning. Catalog compiles peg the CPU for the puppet process that is > doing them and take anywhere from between 20 seconds and 75 > seconds. Then things get drastically worse after 4 compiles (note: I > have four mongrels too, coincidence?), catalog compiles shoot up to 115, > 165, 209, 268, 273, 341, 418, 546, 692, 774, 822, then 1149 > seconds... then things are really hosed. Sometimes hosts will fail > outright and complain about weird things, like: > > Jan 25 14:04:34 puppetmaster puppet-master[30294]: Host is missing hostname > and/or domain: gull.example.com > Jan 25 14:04:55 puppetmaster puppet-master[30294]: Failed to parse template > site-apt/local.list: Could not find value for 'lsbdistcodename' at > /etc/puppet/modules/site-apt/manifests/init.pp:4 on node gull.example.com > > All four of my mongrels are constantly pegged, doing 40-50% of the CPU > each, occupying all available CPUs. They never settle down. I've got 74 > nodes checking in now, it doesn't seem like its that many, but perhaps > i've reached a tipping point with my puppetmaster (its a dual 1ghz, > 2gigs of ram machine)?
The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you shouldn't try to achieve a concurrency of 4 (which your mongrel are trying to do), otherwise what will happen is that more than one request will be accepted by one mongrel process and each thread will contend for the CPU. The bad news is that the ruby MRI uses green threading, so the second thread will only run when the first one will either sleep, do I/O or relinquish the CPU voluntary. In a word, it will only run when the first thread will finish its compilation. Now you have 74 nodes, with the worst compilation time of 75s (which is a lot), that translates to 74*75 = 5550s of compilation time. With a concurrency of 2, that's still 2775s of compilation time per round of <insert here your default sleep time>. With the default 30min of sleep time and assuming a perfect scheduling, that's still larger than a round of sleep time, which means that you won't ever finish compiling nodes, when the first node will ask again for a catalog. And I'm talking only about compilation. If your manifests use file sourcing, you must also add this to the equation. Another explanation of the issue is swapping. You mention your server has 2GiB of RAM. Are you sure your 4 mongrel processes after some times still fit in the physical RAM (along with the other thing running on the server)? Maybe your server is constantly swapping. So you can do several thing to get better performances: * reduce the number of nodes that check in at a single time (ie increase sleep time) * reduce the time it takes to compile a catalog: + which includes not using storeconfigs (or using puppetqd or thin_storeconfig instead). + Check the server is not swapping. + Reduce the number of mongrel instances, to artifically reduce the concurrency (this is counter-intuitive I know) + use a "better" ruby interpreter like Ruby Enterprise Edition (for several reasons this ones has better GC, better memory footprint). + Cache compiled catalogs in nginx + offload file content serving in nginx + Use passenger instead of mongrel Note: you can use puppet-load (in the 2.6 source distribution) to simulate concurrent node asking for catalogs. This is really helpful to size a puppetmaster and check the real concurrency a stack/hardware can give. > I've tried a large number of different things to attempt to work around > this: > > > 0. reduced my node check-in times to be once an hour (and splayed > randomly) > > 1. turn on puppetqd/stomp queuing > > This didn't seem to make a difference, its off now > > 2. turn on thin stored configs > > This sort of helped a little, but not enough > > 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10 > > I didn't see any appreciable difference here. I ended up going back to > 2.3.5 because that was the packaged version. Since you seem to use Debian, make sure you use either the latest ruby lenny backports (or REE) as they fixed an issue with pthreads and CPU consumption: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229 > 4. tried to offload file content via nginx[1] > > This maybe helped a little, but its clear that the problem isn't the > fileserving, it seems to be something in the catalog compilation. Actually offloading only helps when the puppet agent needs the file content which happens only when this one changes or if the node doesn't have this file. In practice this helps only for new nodes. > 5. tried to cache catalogs through adding a http front-end cache and > expiring that cache when manifests are updated[1] > > I'm not sure this works at all. This should have helped because this would prevent the puppetmaster to even be called. You might check your nginx configuration then. > 6. set 'fair' queuing in my nginx.conf[3] > > This seemed to help for a few days, but then things got bad again. > > 7. set --http_compression > > I'm not sure if this actually hurts the master or not (because it has > to now occupy the CPU compressing catalogs?) This is a client option, and you need the collaboration of nginx for it to work. This will certainly add more burden on your master CPU, because nginx now has to gzip everything you're sending. > 8. tried to follow the introspection technique[2] > > this wasn't so easy to do, I had to operate really fast, because if I > was too slow the thread would exit, or it would get hung up on: > > [Thread 0xb6194b70 (LWP 25770) exited] > [New Thread 0xb6194b70 (LWP 25806)] When you attach gdb, how many threads are running? > Eventually I did manage to get somewhere: > > 0xb74f1b16 in memcpy () from /lib/i686/cmov/libc.so.6 > (gdb) session-ruby > (gdb) redirect_stdout > $1 = 2 > (gdb) > $2 = 2 > (gdb) eval "caller" > $3 = 3 > (gdb) rb_object_counts > Cannot get thread event message: debugger service failed > An error occurred while in a function called from GDB. > Evaluation of the expression containing the function > (rb_eval_string_protect) will be abandoned. > When the function is done executing, GDB will silently stop. > (gdb) eval "total = \[\[ObjectSpace\]\].each_object(Array)\{\|x\| puts '---'; > puts x.inspect \}; puts \\"---\\nTotal Arrays: \#{total}\\"" > Invalid character '\' in expression. > > ... then nothing. > > In the tail: > > root@puppetmaster:/tmp# tail -f ruby-debug.28724 > 207 > Puppet::Util::LoadedFile["/usr/lib/ruby/1.8/active_record/base.rb:2746:in > `attributes='", "/usr/lib/ruby/1.8/active_record/base.rb:2742:in `each'", > "/usr/lib/ruby/1.8/active_record/base.rb:2742:in `attributes='", > "/usr/lib/ruby/1.8/active_record/base.rb:2438:in `initialize'", > "/usr/lib/ruby/1.8/active_record/reflection.rb:162:in `new'", > "/usr/lib/ruby/1.8/active_record/reflection.rb:162:in `build_association'", > "/usr/lib/ruby/1.8/active_record/associations/association_collection.rb:423:in > `build_record'", > "/usr/lib/ruby/1.8/active_record/associations/association_collection.rb:102:in > `build'", "/usr/lib/ruby/1.8/puppet/rails/host.rb:145:in `merge_facts'", > "/usr/lib/ruby/1.8/puppet/rails/host.rb:144:in `each'", > "/usr/lib/ruby/1.8/puppet/rails/host.rb:144:in `merge_facts'", > "/usr/lib/ruby/1.8/puppet/rails/host.rb:140:in `each'", > "/usr/lib/ruby/1.8/puppet/rails/host.rb:140:in `merge_facts'", > "/usr/lib/ruby/1.8/puppet/indirector/facts/active_record.rb:32:in `save'", > "/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:256:in `save'", > "/usr/lib/ruby/1.8/puppet/node/facts.rb:15:in `save'", > "/usr/lib/ruby/1.8/puppet/indirector.rb:64:in `save'", > "/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:25:in > `extract_facts_from_request'", > "/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:30:in `find'", > "/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find'", > "/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'", > "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find'", > "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send'", > "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process'", > "/usr/lib/ruby/1.8/mongrel.rb:159:in `process_client'", > "/usr/lib/ruby/1.8/mongrel.rb:158:in `each'", > "/usr/lib/ruby/1.8/mongrel.rb:158:in `process_client'", > "/usr/lib/ruby/1.8/mongrel.rb:285:in `run'", > "/usr/lib/ruby/1.8/mongrel.rb:285:in `initialize'", > "/usr/lib/ruby/1.8/mongrel.rb:285:in `new'", > "/usr/lib/ruby/1.8/mongrel.rb:285:in `run'", > "/usr/lib/ruby/1.8/mongrel.rb:268:in `initialize'", > "/usr/lib/ruby/1.8/mongrel.rb:268:in `new'", > "/usr/lib/ruby/1.8/mongrel.rb:268:in `run'", > "/usr/lib/ruby/1.8/puppet/network/http/mongrel.rb:22:in `listen'", > "/usr/lib/ruby/1.8/puppet/network/server.rb:127:in `listen'", > "/usr/lib/ruby/1.8/puppet/network/server.rb:142:in `start'", > "/usr/lib/ruby/1.8/puppet/daemon.rb:124:in `start'", > "/usr/lib/ruby/1.8/puppet/application/master.rb:114:in `main'", > "/usr/lib/ruby/1.8/puppet/application/master.rb:46:in `run_command'", > "/usr/lib/ruby/1.8/puppet/application.rb:287:in `run'", > "/usr/lib/ruby/1.8/puppet/application.rb:393:in `exit_on_fail'", > "/usr/lib/ruby/1.8/puppet/application.rb:287:in `run'", > "/usr/lib/ruby/1.8/puppet/util/command_line.rb:55:in `execute'", > "/usr/bin/puppet:4"] > > 190 Puppet::Parser::AST::CaseStatement > 181 ZAML::Label > 170 Puppet::Parser::AST::Default > 152 ActiveRecord::DynamicFinderMatch > 152 ActiveRecord::DynamicScopeMatch > 150 ActiveSupport::OrderedHash > 148 OptionParser::Switch::RequiredArgument > 138 YAML::Syck::Node > 125 Range > 124 Puppet::Parser::AST::IfStatement > 117 ActiveRecord::Errors > 115 Puppet::Provider::Confine::Exists > 109 Puppet::Parser::AST::Selector > 108 UnboundMethod > 107 File::Stat > 99 Puppet::Parameter::Value > 90 Bignum > 86 OptionParser::Switch::NoArgument > 85 Puppet::Util::Settings::Setting > 80 Puppet::Indirector::Request > 75 Puppet::Parser::AST::ComparisonOperator > 74 Puppet::Parser::Lexer::Token > 73 Puppet::Parser::AST::ResourceOverride > 70 ActiveRecord::ConnectionAdapters::MysqlColumn > 66 Sync > 65 StringIO > 64 Binding > 62 ActiveSupport::Callbacks::Callback > 61 Puppet::Util::Settings::FileSetting > 58 Puppet::Provider::ConfineCollection > 56 Mysql::Result > 52 Puppet::Module > 47 Puppet::Network::AuthStore::Declaration > 46 IPAddr > 39 Puppet::Util::Settings::BooleanSetting > 38 Thread > 36 Puppet::Util::Autoload > 35 Mysql > 35 ActiveRecord::ConnectionAdapters::MysqlAdapter > 34 Puppet::Parser::AST::Not > 28 Puppet::Type::MetaParamLoglevel > 28 Puppet::Type::File > 28 Puppet::Type::File::ParameterPurge > 28 Puppet::Type::File::ParameterLinks > 28 Puppet::Type::File::Ensure > 28 Puppet::Type::File::ParameterBackup > 28 Puppet::Type::File::ParameterReplace > 28 Puppet::Type::File::ParameterProvider > 28 Puppet::Type::File::ParameterPath > 28 Puppet::Type::File::ProviderPosix > 28 Puppet::Type::File::ParameterChecksum This is just the objects used at a given time. What is more interesting is where the CPU time is spent (ie getting a stacktrace would be helpful, but not easy). > but then it seemed to stop logging entirely... > > I'm available on IRC to try more advanced debugging, just ping me > (hacim). I'd really like things to function again! I'll ping you, but I'm just really busy for this very next couple of days :( -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.