On Sep 26, 2013, at 8:20 AM, Matthew Arguin <matthewarg...@gmail.com> wrote:

> So my reasoning behind the initial question/post again is due largely to 
> being unfamiliar with puppetdb i would say.  We do export a lot of resources 
> in our puppet deployment due to the nagios checks.  In poking around on the 
> groups, i came across this post:  
> https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA
> 
> i was especially interested in the comment posted by windowsrefund at the 
> bottom and trying to understand that because it seems like he is saying that 
> i could reduce the amount of duplication of exported resources, but i am not 
> entirely sure.
> 
> Basic questions:  Is it "bad" to have resource duplication?  Is it "good" to 
> have catalog duplication?  Should i just forget about the 20000 default on 
> the query param or should i be aiming to tune my puppet deployment to work 
> towards that?  (currently set to 50000 to stop the issue).

A few definitions that may help (I should really add this to the FAQ!):

A resource is considered "duplicated" if it exists, identically, on more than 
one system. More specifically: if a resource with the same type, title, 
parameters, and other metadata exists on more than one node in PuppetDB then 
that resource is considered one that is duplicated. So a resource duplication 
rate of, say, 40% means that 60% of your resources exist only on one system. I 
like to think of this as the "snowflake quotient"...it's a measurement of how 
many of your resources are unique and beautiful snowflakes.

A catalog is considered "duplicated" if it's identical to the previous catalog 
that PuppetDB has stored. So if you have a node foo.com, run puppet on it 
twice, and the catalog hasn't changed for that system (you haven't made a 
config change that affects that system between runs) then that's considered a 
catalog duplicate.

Internally, PuppetDB uses both of these concepts to improve performance. If a 
new catalog is exactly the same as the previously stored one for a node, then 
there's no need to use up IO to store it again. Similarly, if a catalog 
contains 90% the same resources that already exist on other nodes, PuppetDB 
doesn't need to store those resources either (rather we can just store pointers 
to already-existing data in the database).

Now, are the numbers you posted good/bad? In the field, we overwhelmingly see 
resource duplication and catalog duplication in the 85-95% range. So I'd say 
that your low resource duplication rate is atypical. It may indicate that you 
are perhaps not leveraging abstractions in your puppet code, or it could be 
that you really, truly have a large number of unique resources. One thing I can 
definitely say, though, is that the higher your resource duplication rate the 
faster PuppetDB will run.

Now, regarding the max query results: I'd set that to whatever works for you. 
If you're doing queries that return a huge number of results, then feel free to 
bump that setting up. The only caveat is, as mentioned before, you need to make 
sure you give PuppetDB enough heap to actually deal with that size of a result 
set. 

Lastly, as Ken Barber indicated, we've already merged in code that eliminates 
the need for that setting. We now stream resource query results to the client 
on-the-fly, avoiding batching things up in memory first. This results in much 
lower memory usage, and greatly reduces the time before the client gets the 
first result. So...problem solved? :)

deepak

> 
> if i did not mention previously, heap currently set to 1G and looking at the 
> spark line, i seem to be maxing out right now at about 500MB.
> 
> 
> On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <da...@dasz.at> wrote:
> On 26.09.2013 05:17, Christopher Wood wrote:
> On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
> 
> (SNIP)
> 
> http://puppetdb1.vm:8080/dashboard/index.html. Since Puppet doesn't
> put a limit on # of resources per node, its hard to say if your case
> is a problem somewhere. It does however sound exceptional but not
> unlikely (I've seen some nodes with 10k resources a-piece for
> example).
> 
> Now I'm curious about
> 
> who these people are
> 
> Me, for example.
> 
> 
> why they need 10,000 resources per host
> 
> Such numbers are easy to reach when every service exports a nagios check into 
> a central server.
> 
> 
> how they keep track of everything
> 
> High modularity. See below.
> 
> 
> how long an agent run takes
> 
> Ages. The biggest node I know takes around 44 minutes to run.
> 
> 
> and how much cpu/ram an agent run takes
> 
> Too much.
> 
> 
> and how they troubleshoot the massive debug output
> 
> Since these 10k+ resources are 99% the same, there is not much to 
> troubleshoot.
> 
> 
> Regards, David
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Puppet Users" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> puppet-users+unsubscr...@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to puppet-users+unsubscr...@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to