>>> I don't really know much about the internals of PuppetDB, so can anyone
>>> shed
>>> any light on the possible cause of these crashes, and what I can do to
>>> mitigate them?
>>
>> I'd be interested to see what is happening your postgresql.log when
>> this occurs. Connections must be closing for some specific reason, and
>> I'd hope that the postgresql instance can shed some light on this.
>
> I don't run the Postgresql server so access to the logs might take some time
> (the guy who used to run it has now left...)

That is disconcerting, I think we need to see those however, since the
client in this case is dumb to the issue, all it sees is a connection
that has been closed underneath it for some reason.

>> You're not trying to connect to the database through a firewall,
>> load-balancer or some other device are you? Even if the device is "in
>> the way" but meant to be passive, I've still seen issues (especially
>> with F5's or checkpoint firewalls that were badly configured).
>
>
> The Puppetmaster and the Postgresql server are on different subnets but are
> not firewalled from each other. No crazy load-balancing here (yet).

Well still, I don't know your setup and unless your run the devices in
the path - you can never ever be sure.

>>> # puppetdb.log
>>>
>>> 2014-04-28 09:46:39,535 ERROR [clojure-agent-send-off-pool-15]
>>> [http.resources] Error streaming response
>>> org.postgresql.util.PSQLException: This connection has been closed.
>>>         at
>>> org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:822)
>>>         at
>>> org.postgresql.jdbc2.AbstractJdbc2Connection.setAutoCommit(AbstractJdbc2Connection.java:769)
>>>         at
>>> com.jolbox.bonecp.ConnectionHandle.setAutoCommit(ConnectionHandle.java:1063)
>>>         at
>>> clojure.java.jdbc.internal$transaction_STAR_.invoke(internal.clj:222)
>>>         at
>>> com.puppetlabs.jdbc$with_transacted_connection_fn$fn__2278.invoke(jdbc.clj:228)
>>>         at
>>> clojure.java.jdbc.internal$with_connection_STAR_.invoke(internal.clj:186)
>>>         at
>>> com.puppetlabs.jdbc$with_transacted_connection_fn.invoke(jdbc.clj:225)
>>>         at
>>> com.puppetlabs.puppetdb.http.resources$produce_body$fn__7017$fn__7020.invoke(resources.clj:36)
>>>         at ring.util.io$piped_input_stream$fn__2512.invoke(io.clj:22)
>>>         at
>>> clojure.core$binding_conveyor_fn$fn__4107.invoke(core.clj:1836)
>>>         at clojure.lang.AFn.call(AFn.java:18)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:744)
>>
>> Not much detail from the JDBC postgresql client really, looks like a
>> connection that we have retrieved from the pool was already closed.
>> This didn't seem to happen during an actual transaction because I can
>> see the attempt to disable autoCommit has triggered the exception,
>> which is something we normally do very early in a transaction.
>>
>> This is synonymous with database connections being timed out, often
>> caused by network devices or some other timeout policy.
>
> As I mentioned, the Puppetmaster/PuppetDB and the Postgresql server on
> different subnets. However access between the two servers happens several
> times a minute as there are so many nodes. Timeout due to inactivity seems
> very unlikely.

Well, while I believe the connection pooling software should be
round-robining all its available connections there is a possibility
that an older stale connection was retrieved. I guess the fact this is
restarting every 30 minutes makes this unlikely.

Regardless, I'd recommend adjusting the keepalive setting to something
aggressive like 1 minute and see what happens. The very fact they live
on different subnets means that traffic is passing through a router of
some kind, and even routers can do strange things to packets.

At the very least it will rule out connection timeouts.

>> Is this the only exception thrown btw? Anything else? Can you show us
>> the rest of the log _around_ the exception?
>
> Attached today's puppetdb.log. There are several instances of the same error
> but other than that, everything seems normal.

Yeah, nothing obviously new to learn from this :-(. At least not yet anyway.

We need to see the other end and what it is doing to connections. This
means checking the postgresql logs and perhaps bumping the log levels
to see if its the PG end dropping the connection (and maybe we can get
a reason ...).

Another idea is to remove the network completely from the equation and
try running a local PG instance on the puppetdb server for a period of
time, this would prove/disprove quite a lot and is probably a good
next step if the PG logging surfaces no new information.

After that I would usually drop to doing a proper network trace using
tcpdump and its ilk, on both the pg and puppetdb ends to see what end
is causing these connections to drop. This is pretty hard-core stuff
though, but certainly would determine what is _actually_ happening
between the two systems.

ken.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/CAE4bNT%3Dkvv3jayQGHP7jjHoCp2B0mBcTiqEiqvZHBAwNpR66bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to