> I'm using Puppet with PuppetDB running on the same machine, but with the > Postgresql database on an external server. Several times a day, PuppetDB > seems to crash with errors like the one below. Nodes are then unable to > check in, although Puppet will restart its own PuppetDB service on its > 30-minute runs so the problem sort of fixes itself after a bit.
Are you doing this to workaround this issue or for other reasons? Just seems a bit weird to do during normal operation. > I don't really know much about the internals of PuppetDB, so can anyone shed > any light on the possible cause of these crashes, and what I can do to > mitigate them? I'd be interested to see what is happening your postgresql.log when this occurs. Connections must be closing for some specific reason, and I'd hope that the postgresql instance can shed some light on this. You're not trying to connect to the database through a firewall, load-balancer or some other device are you? Even if the device is "in the way" but meant to be passive, I've still seen issues (especially with F5's or checkpoint firewalls that were badly configured). > # puppetdb.log > > 2014-04-28 09:46:39,535 ERROR [clojure-agent-send-off-pool-15] > [http.resources] Error streaming response > org.postgresql.util.PSQLException: This connection has been closed. > at > org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:822) > at > org.postgresql.jdbc2.AbstractJdbc2Connection.setAutoCommit(AbstractJdbc2Connection.java:769) > at > com.jolbox.bonecp.ConnectionHandle.setAutoCommit(ConnectionHandle.java:1063) > at > clojure.java.jdbc.internal$transaction_STAR_.invoke(internal.clj:222) > at > com.puppetlabs.jdbc$with_transacted_connection_fn$fn__2278.invoke(jdbc.clj:228) > at > clojure.java.jdbc.internal$with_connection_STAR_.invoke(internal.clj:186) > at > com.puppetlabs.jdbc$with_transacted_connection_fn.invoke(jdbc.clj:225) > at > com.puppetlabs.puppetdb.http.resources$produce_body$fn__7017$fn__7020.invoke(resources.clj:36) > at ring.util.io$piped_input_stream$fn__2512.invoke(io.clj:22) > at clojure.core$binding_conveyor_fn$fn__4107.invoke(core.clj:1836) > at clojure.lang.AFn.call(AFn.java:18) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) Not much detail from the JDBC postgresql client really, looks like a connection that we have retrieved from the pool was already closed. This didn't seem to happen during an actual transaction because I can see the attempt to disable autoCommit has triggered the exception, which is something we normally do very early in a transaction. This is synonymous with database connections being timed out, often caused by network devices or some other timeout policy. There is a setting here where we can adjust the keep alive interval: http://docs.puppetlabs.com/puppetdb/master/configure.html#conn-keep-alive. We default it to 45 minutes for 1.6.3 (you are running the latest PuppetDB right?) ... but some firewalls and devices can have a shorter TTL, like 30 minutes ... I've even seen as little as 5 minutes in aggressively configured devices. For kicks you might want to set this relatively low for a period of time to see if you get any improvements. This might not be the actual problem however, there are other timeouts that might cause this but its the most obscure fault to track down so probably worth testing early. Again I'd be curious to see what the Postgresql server says about this. We might need to bump up the logging though to get anything we can use. Is this the only exception thrown btw? Anything else? Can you show us the rest of the log _around_ the exception? Can you show us the results of a 'show all' on the database as well? Like I've done here: [ken@kb puppetdb]# psql puppetdb psql (9.3.4) Type "help" for help. puppetdb=# show all; name | setting | description -------------------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------- allow_system_table_mods | off | Allows modifications of the structure of system tables. application_name | psql | Sets the application name to be reported in statistics and logs. archive_command | (disabled) | Sets the shell command that will be called to archive a WAL file. archive_mode | off | Allows archiving of WAL files using archive_command. archive_timeout | 0 | Forces a switch to the next xlog file if a new file has not been started within N seconds. array_nulls | on | Enable input of NULL elements in arrays. authentication_timeout | 1min | Sets the maximum allowed time to complete client authentication. ... Also if you can provide everything in your /etc/puppetdb/conf.d/database.ini (obviously hide your private details first) that might be helpful. ken. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/CAE4bNTmAY%3DwJhPj9Q_Z-t0ou-KmqDWTr3Y6Pfg%2BuBE1NLE1vkQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.