On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote:
>
> So that looks like its taking ages ... some questions for you: 
>
> * What version of Postgresql are you running and on what distro & 
> version exactly? 
>
RHEL 6.3
Enterprise DB ppas-9.1.4

* Are you using the distros packages or someone elses? 
>
Provided by Enterprise DB 

* Is autovacuum switched on at all (check your postgresql.conf)? 
>
Yes. We also ran manual vacuum analyze with no noticeable changes in 
performance. 

> * Do you know the last time the database was vacuum'd and/or analyzed? 
> Try this link for a way to check this: 
> http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/ 
>
puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, 
last_autoanalyze from pg_stat_user_tables;
         relname         |          last_vacuum          |       
 last_autovacuum        |         last_analyze          |       
last_autoanalyze        
-------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
 catalog_resources       |                               | 2013-01-12 
15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 
18:27:11.166982+00
 certname_facts_metadata |                               | 2013-01-15 
19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 
19:22:15.308925+00
 certnames               | 2013-01-03 22:22:11.416002+00 | 2013-01-15 
21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 
21:17:48.14814+00
 resource_params         |                               | 2013-01-05 
08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00  | 2013-01-15 
15:56:52.589316+00
 classes                 |                               | 2013-01-12 
09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 
16:24:04.909758+00
 catalogs                |                               | 2013-01-15 
21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 
21:18:16.493626+00
 certname_facts          |                               | 2013-01-15 
21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00  | 2013-01-15 
21:25:08.780617+00
 tags                    |                               | 2013-01-15 
21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00  | 2013-01-15 
21:48:19.751198+00
 edges                   |                               | 2013-01-12 
11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 
17:27:19.061491+00
 certname_catalogs       |                               | 2013-01-15 
21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 
21:18:55.369032+00
 schema_migrations       |                               |                 
              | 2012-11-06 22:02:40.637773+00 | 
(11 rows)
 

> * Can you get me some box metrics, memory size, cpu size/number, disk 
> spindle speeds etc. of your postgresql box? So I can get an idea of 
> the throughput available to you. 
>
It is a VMware VM
the VM has the following properties:
8 CPU
16 GB of memory 
Attached to an Enterprise class SAN

The database is on dev253-12

SAR DISK output
09:40:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     
await     svctm     %util
09:50:01 PM    dev8-0      2.28     33.73     21.09     24.03      0.01     
 2.23      1.38      0.32
09:50:01 PM   dev8-16      2.24      0.23   1501.34    670.68      0.03     
11.20      1.50      0.34
09:50:01 PM   dev8-32      0.57      0.01      7.28     12.68      0.00     
 3.42      3.11      0.18
09:50:01 PM   dev8-48     10.07      0.53   1591.25    158.02      0.02     
 1.74      1.64      1.66
09:50:01 PM   dev8-64    354.77  16164.46   2231.28     51.85      2.80     
 7.89      1.81     64.30
09:50:01 PM  dev253-0      0.78     33.70      0.62     43.93      0.00     
 3.91      2.47      0.19
09:50:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-2      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-3      1.67      0.05     13.28      8.01      0.00     
 1.21      0.38      0.06
09:50:01 PM  dev253-4      0.38      0.00      3.06      8.00      0.00     
 1.64      0.72      0.03
09:50:01 PM  dev253-5      0.43      0.00      3.45      8.00      0.00     
 1.28      0.66      0.03
09:50:01 PM  dev253-6      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-7      0.09      0.04      0.68      8.00      0.00     
 0.72      0.45      0.00
09:50:01 PM  dev253-8      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-9      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM dev253-10    187.70      0.23   1501.34      8.00      2.62     
13.95      0.02      0.34
09:50:01 PM dev253-11    198.97      0.53   1591.25      8.00      0.53     
 2.67      0.08      1.66
09:50:01 PM dev253-12    534.17  16164.46   2231.28     34.44      4.24     
 7.93      1.20     64.32
09:50:01 PM dev253-13      0.91      0.01      7.28      8.00      0.00     
 5.24      1.96      0.18
Average:       dev8-0      3.21     73.51     24.19     30.43      0.02     
 7.48      4.18      1.34
Average:      dev8-16      3.36      1.14   2568.18    763.92      0.08     
22.78      1.09      0.37
Average:      dev8-32      0.66      0.97      7.42     12.74      0.00     
 1.81      1.55      0.10
Average:      dev8-48      9.95      1.60   1373.40    138.23      0.01     
 1.51      1.42      1.41
Average:      dev8-64    140.65   6834.41   2298.08     64.93      1.19     
 8.49      1.82     25.66
Average:     dev253-0      1.34     45.41      1.08     34.72      0.02     
15.92      5.52      0.74
Average:     dev253-1      0.03      0.20      0.07      8.00      0.00     
 2.18      0.67      0.00
Average:     dev253-2      0.04      0.23      0.08      8.00      0.00     
10.40      2.11      0.01
Average:     dev253-3      1.89      9.17     13.84     12.20      0.01     
 3.85      1.67      0.31
Average:     dev253-4      0.46      0.21      3.43      8.00      0.00     
 2.26      0.91      0.04
Average:     dev253-5      0.81     16.84      4.94     26.88      0.01     
 7.93      2.83      0.23
Average:     dev253-6      0.02      0.19      0.00      8.00      0.00     
 0.34      0.32      0.00
Average:     dev253-7      0.12      0.23      0.74      8.10      0.00     
 1.41      0.99      0.01
Average:     dev253-8      0.02      0.19      0.00      8.00      0.00     
 0.33      0.32      0.00
Average:     dev253-9      0.04      0.36      0.00      8.00      0.00     
 0.52      0.52      0.00
Average:    dev253-10    321.11      0.74   2568.18      8.00      9.20     
28.65      0.01      0.36
Average:    dev253-11    171.74      1.20   1373.40      8.00      0.43     
 2.53      0.08      1.42
Average:    dev253-12    339.64   6834.01   2298.08     26.89      3.01     
 8.84      0.76     25.66
Average:    dev253-13      0.98      0.57      7.42      8.19      0.00     
 2.26      1.01      0.10

* Maybe any information about the postgresql tuning items - 
> shared_buffers is probably a good start, or have you left the tuning 
> alone? 
>
shared_buffers = 4096MB
work_mem = 8MB 
maintenance_work_mem = 32MB 

>
> You probably don't need debug with log-slow-statements, I wasn't aware 
> that was an option but I think that gives a good amount of data for 
> this problem. 
>
> On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com <javascript:>> 
> wrote: 
> > I will work on setting up a puppetdb server I can put into debug mode. 
>  For 
> > now here is a "normal" GC run log output and my configuration files. 
> > 
> > Thanks, 
> > 
> > 
> > 2013-01-15 19:20:57,767 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Starting database garbage collection 
> > 2013-01-15 19:26:40,383 WARN  [clojure-agent-send-off-pool-12] 
> > [jdbc.internal] Query slower than 10s threshold:  actual execution time: 
> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT * 
> > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query Params: 
> > 2013-01-15 19:33:29,304 WARN  [clojure-agent-send-off-pool-12] 
> > [jdbc.internal] Query slower than 10s threshold:  actual execution time: 
> > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS 
> > (SELECT * FROM catalog_resources cr WHERE 
> > cr.resource=resource_params.resource); Query Params: 
> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Finished database garbage collection 
> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Starting sweep of stale nodes (1 day threshold) 
> > 2013-01-15 19:33:29,390 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Finished sweep of stale nodes (1 day threshold) 
> > 
> > ########################################################## 
> > # 
> > #   config.ini 
> > # 
> > 
> > # See README.md for more thorough explanations of each section and 
> > # option. 
> > 
> > [global] 
> > # Store mq/db data in a custom directory 
> > vardir = /var/lib/puppetdb 
> > # Use an external log4j config file 
> > logging-config = /etc/puppetdb/conf.d/../log4j.properties 
> > 
> > # Maximum number of results that a resource query may return 
> > resource-query-limit = 20000 
> > 
> > [command-processing] 
> > # How many command-processing threads to use, defaults to (CPUs / 2) 
> > threads = 10 
> > 
> > ########################################################## 
> > # 
> > # database.ini 
> > # 
> > 
> > [database] 
> > # For the embedded DB: org.hsqldb.jdbcDriver 
> > # For PostgreSQL: org.postgresql.Driver 
> > # Defaults to embedded DB 
> > classname = org.postgresql.Driver 
> > 
> > # For the embedded DB: hsqldb 
> > # For PostgreSQL: postgresql 
> > # Defaults to embedded DB 
> > subprotocol = postgresql 
> > 
> > # For the embedded DB: 
> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true 
> > # For PostgreSQL: //host:port/databaseName 
> > # Defaults to embedded DB located in <vardir>/db 
> > subname = //PUPPETDB:5432/puppet 
> > 
> > # Connect as a specific user 
> > username = PUPPETDBUSER 
> > 
> > # Use a specific password 
> > password = PASSWORDPUPPETDB 
> > 
> > # How often (in minutes) to compact the database 
> > # gc-interval = 60 
> > 
> > # Number of seconds before any SQL query is considered 'slow'; offending 
> > # queries will not be interrupted, but will be logged at the WARN log 
> level. 
> > log-slow-statements = 10 
> > 
> > # How old (in days) to deactivate nodes that have not reported 
> > node-ttl-days = 1 
> > 
> > # How often to send a keepalive 
> > conn-keep-alive = 4 
> > 
> > # How long to keep connections open 
> > conn-max-age = 30 
> > 
> > 
> > 
> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: 
> >> 
> >> Hey Chuck, 
> >> 
> >> I've had a chat with my colleagues and they raised some concerns about 
> >> your gc performance. I wouldn't mind drilling into that problem if 
> >> thats okay with you? 
> >> 
> >> For starters - what are your settings looking like? In particular I'm 
> >> interested in node-ttl-seconds and report-ttl-seconds. 
> >> 
> >> FYI there are a few actions that occur when gc kicks in: 
> >> 
> >> * sweeping stale nodes 
> >> * sweeping stale reports 
> >> * removing any unassociated data (catalogs or params) 
> >> 
> >> It would be useful to find out which of these are taking up the most 
> >> amount of your time. 
> >> 
> >> I wouldn't mind a dump of your logs when in a specific debug mode if 
> >> thats at all possible, this should give me a rough idea of the time 
> >> the sweeper spends doing its various deletes so I can at least get a 
> >> focus on your issue. In particular the bits between 'Starting database 
> >> garbage collection' and 'Finished database garbage collection'. 
> >> 
> >> The easiest way to get some debugging is to modify your 
> >> log4j.properties (usually in /etc/puppetdb) and add the line: 
> >> 
> >> log4j.logger.com.jolbox.bonecp=DEBUG 
> >> 
> >> This will log *ALL* SQL statements, so it might be heavy - I'm not 
> >> sure if you can isolate one of your nodes for this purpose or somehow 
> >> mitigate this risk ... we only need a heavy GC run, it probably 
> >> doesn't need to be actively receiving requests and such ... 
> >> 
> >> ken. 
> >> 
> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote: 
> >> > 
> >> > 
> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: 
> >> >> 
> >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 
> >> >> > puppetdb 
> >> >> > servers in various geographic locations.   The garbage collect 
> seems 
> >> >> > to 
> >> >> > be a 
> >> >> > pretty intensive operation on the Postgres DB server in the 
> current 
> >> >> > design. 
> >> >> 
> >> >> Thats a shame to hear. What's the impact for you? I see from your 
> >> >> screenshot its averaging 16 seconds, is that what you are seeing? 
> >> > 
> >> > 
> >> > This is normally what we see 8 times of course. 
> >> > 
> >> >> 
> >> >> > We have also been having an issue that started at the beginning of 
> >> >> > 2013 
> >> >> > that 
> >> >> > we have not been able to narrow down at this time (we were on 
> >> >> > puppetdb 
> >> >> > 1.0.1 
> >> >> > for a long time with no issues.   Updated to 1.0.5 last week with 
> no 
> >> >> > improvement) 
> >> >> 
> >> >> This issue, is it related to the gc problem or something else? Can 
> you 
> >> >> elaborate? 
> >> >> 
> >> > We are still trying to determine what is happening.  We just see a 
> huge 
> >> > spike in the number of rows being deleted. 
> >> > 
> >> >> 
> >> >> > I was planning on having my puppet manifest, for the puppetdb 
> >> >> > servers, 
> >> >> > elect 
> >> >> > one or two master puppetdb servers by querying for active puppetdb 
> >> >> > servers. 
> >> >> > So I would be dynamically electing the master(s) based on an 
> >> >> > algorithm. 
> >> >> 
> >> >> Sounds good to me. Do you want to raise a feature request for this? 
> >> > 
> >> > 
> >> > Yeah I can do that. 
> >> > 
> >> >> 
> >> >> http://projects.puppetlabs.com/projects/puppetdb 
> >> >> 
> >> >> Just more or less cover what you've said about wanting to disable by 
> >> >> setting to zero. I've already more or less got a patch that can 
> >> >> disable it when zero (its really just a conditional clause in the 
> >> >> right place), but I haven't done any thorough testing or gotten 
> review 
> >> >> on it yet. 
> >> >> 
> >> >> ken. 
> >> > 
> >> > -- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "Puppet Users" group. 
> >> > To view this discussion on the web visit 
> >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. 
> >> > 
> >> > To post to this group, send email to puppet...@googlegroups.com. 
> >> > To unsubscribe from this group, send email to 
> >> > puppet-users...@googlegroups.com. 
> >> > For more options, visit this group at 
> >> > http://groups.google.com/group/puppet-users?hl=en. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Puppet Users" group. 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. 
> > 
> > To post to this group, send email to 
> > puppet...@googlegroups.com<javascript:>. 
>
> > To unsubscribe from this group, send email to 
> > puppet-users...@googlegroups.com <javascript:>. 
> > For more options, visit this group at 
> > http://groups.google.com/group/puppet-users?hl=en. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/puppet-users/-/3BUCBok8BzIJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to