> > > We're at about the 100 hosts, but have closer to 1500 services - maybe > > we have exceeded what storeconfigs can do then. > > hmm.. so yeah, you've hit the same kind of very bad scaling from the > nagios config native resources than I've experienced. Seeing how bad it > becomes with that number of services is now convincing me that I want to > change method.
I'm just not convinced yet that the issue is on the stored config side. The config retrieval/catalog build is 2-3 mins (long, I agree), but that's nothing compared to the other 17 minutes the client is busy with no output in debug mode and 100% on one core on the CPU used by puppet. What is it doing then? > > If that is the case, is > > there a recommended alternative that isn't manually maintaining config > > files? > > One alternative would be to use file templates, combined with > concatenated_file resources (from David Schmidt's 'puppet-common' module). > That way, for every host and service definition (and other nagios config > items), you can export a file and its contents will be verified by md5 > sum. Every file that you export to the nagios server should notify a > concatenation exec that binds everything together. > > The good thing with this method is that you can manage the module > directory (where the different config file excerpts are stored) with > 'purge => true' so that only exported resources are present in the final > nagios configuration (something that native types don't handle very well > -- or actually handle very badly). > We actually wrote our own nagios module rather than using the built-in ones. We are exporting one cfg file per host for the host config, one subdir per host which contains all of the checks for that host (one file per check) and then can use purge on it as well to clean removed hosts/services. I know puppet does an md5sum on each of the files, but since the md5sum binary runs on all of /etc (with the binary being invoked once per file using find -exec) in less than 3 seconds it seems like something else is going on. I assume puppet is using a ruby md5 method and I haven't tested it, but I can't believe there is that significant of a difference over a binary invocation. > > > It seems like most of the processing time is spent client side > > and I haven't been able to figure out why. Even doing an md5sum on all > > of the files from the CLI takes less than 2 seconds. > > I haven't traced the thing, but from what I could understand, the most > time is spent in resolving relationships between exported nagios > resources and ensuring that all the exported resources are unique. To > verify this, you could setup postgres to log SQL requests and check out > what gets requested during one run. This is going to be another issue as this scales, but right now if I could get puppet runs to <5 mins on the nagios server I would be happy enough to move on and come back later. I should probably do this though so I understand how the queries are structured to see if there is any way I can add more dependency information to the data we are feeding the DB to make the queries more efficient. I appreciate the thoughts and feedback. Let me know if there is any other way to get more debug info that might help figure out what is going on or if there is a doNothing() method somewhere deep in the puppet magic. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.