>
> > We're at about the 100 hosts, but have closer to 1500 services - maybe
> > we have exceeded what storeconfigs can do then.
>
> hmm.. so yeah, you've hit the same kind of very bad scaling from the
> nagios config native resources than I've experienced. Seeing how bad it
> becomes with that number of services is now convincing me that I want to
> change method.


I'm just not convinced yet that the issue is on the stored config side.  The
config retrieval/catalog build is 2-3 mins (long, I agree), but that's
nothing compared to the other 17 minutes the client is busy with no output
in debug mode and 100% on one core on the CPU used by puppet.  What is it
doing then?


> >  If that is the case, is
> > there a recommended alternative that isn't manually maintaining config
> > files?
>
> One alternative would be to use file templates, combined with
> concatenated_file resources (from David Schmidt's 'puppet-common' module).
> That way, for every host and service definition (and other nagios config
> items), you can export a file and its contents will be verified by md5
> sum. Every file that you export to the nagios server should notify a
> concatenation exec that binds everything together.
>
> The good thing with this method is that you can manage the module
> directory (where the different config file excerpts are stored) with
> 'purge => true' so that only exported resources are present in the final
> nagios configuration (something that native types don't handle very well
> -- or actually handle very badly).
>

We actually wrote our own nagios module rather than using the built-in ones.
 We are exporting one cfg file per host for the host config, one subdir per
host which contains all of the checks for that host (one file per check) and
then can use purge on it as well to clean removed hosts/services.  I know
puppet does an md5sum on each of the files, but since the md5sum binary runs
on all of /etc (with the binary being invoked once per file using find
-exec) in less than 3 seconds it seems like something else is going on.  I
assume puppet is using a ruby md5 method and I haven't tested it, but I
can't believe there is that significant of a difference over a binary
invocation.


>
> > It seems like most of the processing time is spent client side
> > and I haven't been able to figure out why.  Even doing an md5sum on all
> > of the files from the CLI takes less than 2 seconds.
>
> I haven't traced the thing, but from what I could understand, the most
> time is spent in resolving relationships between exported nagios
> resources and ensuring that all the exported resources are unique. To
> verify this, you could setup postgres to log SQL requests and check out
> what gets requested during one run.


This is going to be another issue as this scales, but right now if I could
get puppet runs to <5 mins on the nagios server I would be happy enough to
move on and come back later.  I should probably do this though so I
understand how the queries are structured to see if there is any way I can
add more dependency information to the data we are feeding the DB to make
the queries more efficient.

I appreciate the thoughts and feedback.  Let me know if there is any other
way to get more debug info that might help figure out what is going on or if
there is a doNothing() method somewhere deep in the puppet magic.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to