On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote: > > So that looks like its taking ages ... some questions for you: > > * What version of Postgresql are you running and on what distro & > version exactly? > RHEL 6.3 Enterprise DB ppas-9.1.4
* Are you using the distros packages or someone elses? > Provided by Enterprise DB * Is autovacuum switched on at all (check your postgresql.conf)? > Yes. We also ran manual vacuum analyze with no noticeable changes in performance. > * Do you know the last time the database was vacuum'd and/or analyzed? > Try this link for a way to check this: > http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/ > puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, last_autoanalyze from pg_stat_user_tables; relname | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze -------------------------+-------------------------------+-------------------------------+-------------------------------+------------------------------- catalog_resources | | 2013-01-12 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 18:27:11.166982+00 certname_facts_metadata | | 2013-01-15 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 19:22:15.308925+00 certnames | 2013-01-03 22:22:11.416002+00 | 2013-01-15 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 21:17:48.14814+00 resource_params | | 2013-01-05 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00 | 2013-01-15 15:56:52.589316+00 classes | | 2013-01-12 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 16:24:04.909758+00 catalogs | | 2013-01-15 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 21:18:16.493626+00 certname_facts | | 2013-01-15 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00 | 2013-01-15 21:25:08.780617+00 tags | | 2013-01-15 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00 | 2013-01-15 21:48:19.751198+00 edges | | 2013-01-12 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 17:27:19.061491+00 certname_catalogs | | 2013-01-15 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 21:18:55.369032+00 schema_migrations | | | 2012-11-06 22:02:40.637773+00 | (11 rows) > * Can you get me some box metrics, memory size, cpu size/number, disk > spindle speeds etc. of your postgresql box? So I can get an idea of > the throughput available to you. > It is a VMware VM the VM has the following properties: 8 CPU 16 GB of memory Attached to an Enterprise class SAN The database is on dev253-12 SAR DISK output 09:40:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 09:50:01 PM dev8-0 2.28 33.73 21.09 24.03 0.01 2.23 1.38 0.32 09:50:01 PM dev8-16 2.24 0.23 1501.34 670.68 0.03 11.20 1.50 0.34 09:50:01 PM dev8-32 0.57 0.01 7.28 12.68 0.00 3.42 3.11 0.18 09:50:01 PM dev8-48 10.07 0.53 1591.25 158.02 0.02 1.74 1.64 1.66 09:50:01 PM dev8-64 354.77 16164.46 2231.28 51.85 2.80 7.89 1.81 64.30 09:50:01 PM dev253-0 0.78 33.70 0.62 43.93 0.00 3.91 2.47 0.19 09:50:01 PM dev253-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-3 1.67 0.05 13.28 8.01 0.00 1.21 0.38 0.06 09:50:01 PM dev253-4 0.38 0.00 3.06 8.00 0.00 1.64 0.72 0.03 09:50:01 PM dev253-5 0.43 0.00 3.45 8.00 0.00 1.28 0.66 0.03 09:50:01 PM dev253-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-7 0.09 0.04 0.68 8.00 0.00 0.72 0.45 0.00 09:50:01 PM dev253-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-10 187.70 0.23 1501.34 8.00 2.62 13.95 0.02 0.34 09:50:01 PM dev253-11 198.97 0.53 1591.25 8.00 0.53 2.67 0.08 1.66 09:50:01 PM dev253-12 534.17 16164.46 2231.28 34.44 4.24 7.93 1.20 64.32 09:50:01 PM dev253-13 0.91 0.01 7.28 8.00 0.00 5.24 1.96 0.18 Average: dev8-0 3.21 73.51 24.19 30.43 0.02 7.48 4.18 1.34 Average: dev8-16 3.36 1.14 2568.18 763.92 0.08 22.78 1.09 0.37 Average: dev8-32 0.66 0.97 7.42 12.74 0.00 1.81 1.55 0.10 Average: dev8-48 9.95 1.60 1373.40 138.23 0.01 1.51 1.42 1.41 Average: dev8-64 140.65 6834.41 2298.08 64.93 1.19 8.49 1.82 25.66 Average: dev253-0 1.34 45.41 1.08 34.72 0.02 15.92 5.52 0.74 Average: dev253-1 0.03 0.20 0.07 8.00 0.00 2.18 0.67 0.00 Average: dev253-2 0.04 0.23 0.08 8.00 0.00 10.40 2.11 0.01 Average: dev253-3 1.89 9.17 13.84 12.20 0.01 3.85 1.67 0.31 Average: dev253-4 0.46 0.21 3.43 8.00 0.00 2.26 0.91 0.04 Average: dev253-5 0.81 16.84 4.94 26.88 0.01 7.93 2.83 0.23 Average: dev253-6 0.02 0.19 0.00 8.00 0.00 0.34 0.32 0.00 Average: dev253-7 0.12 0.23 0.74 8.10 0.00 1.41 0.99 0.01 Average: dev253-8 0.02 0.19 0.00 8.00 0.00 0.33 0.32 0.00 Average: dev253-9 0.04 0.36 0.00 8.00 0.00 0.52 0.52 0.00 Average: dev253-10 321.11 0.74 2568.18 8.00 9.20 28.65 0.01 0.36 Average: dev253-11 171.74 1.20 1373.40 8.00 0.43 2.53 0.08 1.42 Average: dev253-12 339.64 6834.01 2298.08 26.89 3.01 8.84 0.76 25.66 Average: dev253-13 0.98 0.57 7.42 8.19 0.00 2.26 1.01 0.10 * Maybe any information about the postgresql tuning items - > shared_buffers is probably a good start, or have you left the tuning > alone? > shared_buffers = 4096MB work_mem = 8MB maintenance_work_mem = 32MB > > You probably don't need debug with log-slow-statements, I wasn't aware > that was an option but I think that gives a good amount of data for > this problem. > > On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com <javascript:>> > wrote: > > I will work on setting up a puppetdb server I can put into debug mode. > For > > now here is a "normal" GC run log output and my configuration files. > > > > Thanks, > > > > > > 2013-01-15 19:20:57,767 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Starting database garbage collection > > 2013-01-15 19:26:40,383 WARN [clojure-agent-send-off-pool-12] > > [jdbc.internal] Query slower than 10s threshold: actual execution time: > > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT * > > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query Params: > > 2013-01-15 19:33:29,304 WARN [clojure-agent-send-off-pool-12] > > [jdbc.internal] Query slower than 10s threshold: actual execution time: > > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS > > (SELECT * FROM catalog_resources cr WHERE > > cr.resource=resource_params.resource); Query Params: > > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Finished database garbage collection > > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Starting sweep of stale nodes (1 day threshold) > > 2013-01-15 19:33:29,390 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Finished sweep of stale nodes (1 day threshold) > > > > ########################################################## > > # > > # config.ini > > # > > > > # See README.md for more thorough explanations of each section and > > # option. > > > > [global] > > # Store mq/db data in a custom directory > > vardir = /var/lib/puppetdb > > # Use an external log4j config file > > logging-config = /etc/puppetdb/conf.d/../log4j.properties > > > > # Maximum number of results that a resource query may return > > resource-query-limit = 20000 > > > > [command-processing] > > # How many command-processing threads to use, defaults to (CPUs / 2) > > threads = 10 > > > > ########################################################## > > # > > # database.ini > > # > > > > [database] > > # For the embedded DB: org.hsqldb.jdbcDriver > > # For PostgreSQL: org.postgresql.Driver > > # Defaults to embedded DB > > classname = org.postgresql.Driver > > > > # For the embedded DB: hsqldb > > # For PostgreSQL: postgresql > > # Defaults to embedded DB > > subprotocol = postgresql > > > > # For the embedded DB: > > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true > > # For PostgreSQL: //host:port/databaseName > > # Defaults to embedded DB located in <vardir>/db > > subname = //PUPPETDB:5432/puppet > > > > # Connect as a specific user > > username = PUPPETDBUSER > > > > # Use a specific password > > password = PASSWORDPUPPETDB > > > > # How often (in minutes) to compact the database > > # gc-interval = 60 > > > > # Number of seconds before any SQL query is considered 'slow'; offending > > # queries will not be interrupted, but will be logged at the WARN log > level. > > log-slow-statements = 10 > > > > # How old (in days) to deactivate nodes that have not reported > > node-ttl-days = 1 > > > > # How often to send a keepalive > > conn-keep-alive = 4 > > > > # How long to keep connections open > > conn-max-age = 30 > > > > > > > > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: > >> > >> Hey Chuck, > >> > >> I've had a chat with my colleagues and they raised some concerns about > >> your gc performance. I wouldn't mind drilling into that problem if > >> thats okay with you? > >> > >> For starters - what are your settings looking like? In particular I'm > >> interested in node-ttl-seconds and report-ttl-seconds. > >> > >> FYI there are a few actions that occur when gc kicks in: > >> > >> * sweeping stale nodes > >> * sweeping stale reports > >> * removing any unassociated data (catalogs or params) > >> > >> It would be useful to find out which of these are taking up the most > >> amount of your time. > >> > >> I wouldn't mind a dump of your logs when in a specific debug mode if > >> thats at all possible, this should give me a rough idea of the time > >> the sweeper spends doing its various deletes so I can at least get a > >> focus on your issue. In particular the bits between 'Starting database > >> garbage collection' and 'Finished database garbage collection'. > >> > >> The easiest way to get some debugging is to modify your > >> log4j.properties (usually in /etc/puppetdb) and add the line: > >> > >> log4j.logger.com.jolbox.bonecp=DEBUG > >> > >> This will log *ALL* SQL statements, so it might be heavy - I'm not > >> sure if you can isolate one of your nodes for this purpose or somehow > >> mitigate this risk ... we only need a heavy GC run, it probably > >> doesn't need to be actively receiving requests and such ... > >> > >> ken. > >> > >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote: > >> > > >> > > >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: > >> >> > >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 > >> >> > puppetdb > >> >> > servers in various geographic locations. The garbage collect > seems > >> >> > to > >> >> > be a > >> >> > pretty intensive operation on the Postgres DB server in the > current > >> >> > design. > >> >> > >> >> Thats a shame to hear. What's the impact for you? I see from your > >> >> screenshot its averaging 16 seconds, is that what you are seeing? > >> > > >> > > >> > This is normally what we see 8 times of course. > >> > > >> >> > >> >> > We have also been having an issue that started at the beginning of > >> >> > 2013 > >> >> > that > >> >> > we have not been able to narrow down at this time (we were on > >> >> > puppetdb > >> >> > 1.0.1 > >> >> > for a long time with no issues. Updated to 1.0.5 last week with > no > >> >> > improvement) > >> >> > >> >> This issue, is it related to the gc problem or something else? Can > you > >> >> elaborate? > >> >> > >> > We are still trying to determine what is happening. We just see a > huge > >> > spike in the number of rows being deleted. > >> > > >> >> > >> >> > I was planning on having my puppet manifest, for the puppetdb > >> >> > servers, > >> >> > elect > >> >> > one or two master puppetdb servers by querying for active puppetdb > >> >> > servers. > >> >> > So I would be dynamically electing the master(s) based on an > >> >> > algorithm. > >> >> > >> >> Sounds good to me. Do you want to raise a feature request for this? > >> > > >> > > >> > Yeah I can do that. > >> > > >> >> > >> >> http://projects.puppetlabs.com/projects/puppetdb > >> >> > >> >> Just more or less cover what you've said about wanting to disable by > >> >> setting to zero. I've already more or less got a patch that can > >> >> disable it when zero (its really just a conditional clause in the > >> >> right place), but I haven't done any thorough testing or gotten > review > >> >> on it yet. > >> >> > >> >> ken. > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "Puppet Users" group. > >> > To view this discussion on the web visit > >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. > >> > > >> > To post to this group, send email to puppet...@googlegroups.com. > >> > To unsubscribe from this group, send email to > >> > puppet-users...@googlegroups.com. > >> > For more options, visit this group at > >> > http://groups.google.com/group/puppet-users?hl=en. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Puppet Users" group. > > To view this discussion on the web visit > > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. > > > > To post to this group, send email to > > puppet...@googlegroups.com<javascript:>. > > > To unsubscribe from this group, send email to > > puppet-users...@googlegroups.com <javascript:>. > > For more options, visit this group at > > http://groups.google.com/group/puppet-users?hl=en. > -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/3BUCBok8BzIJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.