On Wednesday, November 14, 2012 11:37:23 AM UTC-6, Matt wrote: > > I was working on this a bit w/ grim_radical/nlewis on Friday afternoon, > and wanted to loop back, figured at least starting the conversion here > again was a good idea. > > We have starting having problems with one query slowing our puppetdb way, > way down, and seemingly blocking other queries/node runs. > > We are an HPC shop, and have about 2K nodes. of that, about 1500 do the > following: > > @@sshkey { "${hostname}": > host_aliases => ["$fqdn", "$ipaddress" ], > type => "rsa", > key => $sshrsakey, > ensure => present, > } > > Sshkey <<| type == "rsa" |>> {ensure => present} > > Thats it. nothing really crazy/special in there. > > This ends up as: > > LOG: duration: 5690.773 ms execute <unnamed>: select results.* from > (SELECT certname_catalogs.certname, catalog_resources.resource, > catalog_resources.type, catalog_resources.title,catalog_resources.tags, > catalog_resources.exported, catalog_resources.sourcefile, > catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources > JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp > USING(resource) INNER JOIN certnames ON certname_catalogs.certname = > certnames.name WHERE (catalog_resources.type = $1) AND > (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND > (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource > IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND > rp.value = $5))) results LIMIT 50001 > DETAIL: parameters: $1 = 'Sshkey', $2 = 't', $3 = 'hero4209', $4 = > 'type', $5 = '"rsa"' > > Would adding an index on this be an option (i'm not a huge postgres guru, > maybe I'm using the wrong terms). > > As soon as we commented out the collection, like: > > #Sshkey <<| type == "rsa" |>> {ensure => present} > > Things all go back to normal, and nodes run nice and quickly. With that in > there, nodes would hang running and start timing out. Our 2K nodes are on a > 2 hour run interval. > > Any help/thoughts? I'm in irc as sjoeboo as well. > >
First, make sure you are using thin_storeconfigs. After making the switch (if it is a switch), it may take some time for the all nodes' changes to propagate to the DB, but the difference should be a lot fewer rows in your DB. That could speed you up far more than any indexing. Also, I presume that you are already using the PostgreSQL back-end instead of the built-in one, but if not then you should switch now. The built-in back end is simply not up to the task of handing so many nodes efficiently. The query itself looks like it could use some optimization, but that's out of your hands unless you want to hack on Puppet itself. I don't know which columns may be indexed already, and I didn't find any documentation of the schema at PL or in puppetlabs' GitHub (what's up with that, PL?). It might indeed be the case that adding indexes on one or more key columns would help you out, but you really ought to tackle this in a systematic manner. Specifically, use a query analyzer (I presume Postgres has one) to identify the expensive parts of that query, and consider adding indices that will improve those parts (e.g. indices on columns of long tables that serve as join columns or WHERE criteria). Lather, rinse, repeat until it's good enough or you can't do any better. Be aware also that time saved in the query will be partially offset by time consumed in maintaining each additional index you create. If you choose strategically then you could conceivably see a dramatic overall gain, but don't go overboard. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/e_-i0KSXnIUJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.