Re: Occasional long wait for a JDBC connection

Filip Hanik Wed, 14 Jan 2015 11:37:09 -0800

The timeout happens in your SocketRead, this is configurable (default is
forever)
http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html


what appears to be happening is that somewhere there isn't a reset packet
sent from the server to the JDBC driver. Setting a timeout may alleviate
this. This property is not the same as query timeout, this is simply a
timeout in between packets. but if you have long running queries, this may
affect it.

*socketTimeout*



On Wed, Jan 14, 2015 at 11:44 AM, Christopher Schultz <
ch...@christopherschultz.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Darren,
>
> On 1/13/15 11:32 PM, Darren Davis wrote:
> > On Tue, Jan 13, 2015 at 8:39 PM, Christopher Schultz <
> > ch...@christopherschultz.net> wrote:
> >
> > Darren,
> >
> > (Sorry... just had to remove that monstrous stack trace...)
> >
> > On 1/13/15 5:04 PM, Darren Davis wrote:
> >>>> Hi Christopher.  Yes, we've tried a show process list and can
> >>>> find no evidence of the validation query running on mysql.
> >
> > Strange. Maybe you are waiting for the db server's buffer to flush
> > or something like that.
> >
> >
> >> I think this is because the client thinks it still has an open
> >> connection, the client net stat command shows an open connection
> >> over port 3306, at least for a few minutes after it's killed by
> >> the load balancer.  The Server loses its connection in netstat
> >> immediately.
> >
> >
> >>>> We also just tried an experiment outside of Tomcat
> >>>> completely, but connecting to a downed web server host and
> >>>> manually opening up a mysql client connection to the data
> >>>> server and executing a single command.
> >>>>
> >>>> We left that client window idle for an hour and 5 minutes,
> >>>> and attempted to execute a simple select count(*) command
> >>>> against a tiny table.  The client attempted to execute the
> >>>> query, and a NetStat on that box showed an open connection
> >>>> between the two servers using port 3306.  We also checked the
> >>>> process list during this time and could not find any queries
> >>>> at all from the sever in question.
> >>>>
> >>>> At about the 15 minute wait mark, the client finally came
> >>>> back with this message: "ERROR 2013 (HY000): Lost connection
> >>>> to MySQL server during query.
> >
> > Was this with the MySQL command-line client? What query did you
> > issue ("SELECT 1")?
> >
> >
> >> Yes, it was just the command line client, and we issued a select
> >> count(*) from a table with a couple rows in it.
> >
> >
> >>>> Attempting the execute the command a 2nd time (using the up
> >>>> arrow), re-established the connection and it ran perfectly in
> >>>> a few milliseconds.
> >
> > That's interesting. I've never experienced anything like that with
> > MySQL, but we use a VLAN between our application and database
> > servers with no hardware firewall, so we don't have any connection
> > timeout problems. Also, when connections are dropped due to
> > inactivity, they re-connect without any problems.
> >
> >>>> I checked the mysql configuration and it is set to the
> >>>> default values for keeping connections/interactive
> >>>> connections open (for 8 hours), so it seems that maybe the
> >>>> Cisco firewall between the two servers is terminating
> >>>> connections out from under us, but in a way what the O/S
> >>>> cannot detect it.
> >
> > What if you set that idle connection timeout to something like 5
> > minutes? Can you reproduce this issue more quickly? Can you look
> > at the fw configuration to see if you can change the idle timeout
> > /down/ to something more testable?
> >
> > As part of our move to the new versions of Tomcat/Java, we are in a
> > new
> >> cloud environment which features a different type of firewall.
> >> The provider confirmed to us late today, that it is configured to
> >> kill "idle" TCP connections after an hour, which is something our
> >> old firewall didn't do.
> >
> >> Because we sometimes have low traffic during this time of the
> >> year, especially on the weekends, what we think is happening is
> >> that one or more of the minimum 10 connections is going unused
> >> for more than an hour, and since we didn't have any connection
> >> testing while idle turned on, they were being killed by the
> >> firewall out from under the pool, and depending on how soon they
> >> were used after that, we would run into the 15 minute delay
> >> before they were deemed lost, and replaced with a new
> >> connection.
>
> This should be entirely possible. That's the point of the
> connection-validation operation (whether done by an actual query or
> not). The question is why the connection is being dropped in a way
> that is thwarting the connection-validation at all. It may come down
> to some kind of OS-level setting, or a slightly different
> configuration on the firewall.
>
> It seems that removing the firewall's idle-connection policy would be
> an easy way to try to get around this issue at least temporarily.
>
> >>>> I've also fired up the yourKit profiler on this box and am
> >>>> seeing other threads which have had to wait in the same
> >>>> SocketInputStream.read code, all three started a few seconds
> >>>> apart, it just wasn't detected as a deadlock, because it took
> >>>> place outside of any synchronized methods.
> >
> > What makes you think it's deadlock? Deadlock is a very specific
> > thing. Just because many threads are waiting in
> > SocketInputStream.read doesn't mean there are any threading issues
> > at all. I suspect that each SocketInputStream is distinct and only
> > in use by a single thread. The threads are blocked on I/O, right?
> > So they aren't waiting on a monitor. The best you could do would be
> > to find the native file descriptor for each socket and determine
> > that they are different from each other. I would be very surprised
> > if they are the same, used across threads. If you *are* using
> > Connection objects across threads, you should be very careful.
> > Connection objects ought to be threadsafe (I think) but use of
> > Statement and ResultSet objects across threads is a terrible idea.
> >
> >> We have a couple of synchronized methods in two of our services
> >> which hold locks in order to update a centralized record.  We
> >> realize this is a bad design and are already working on
> >> re-factoring this code to remove this need.
>
> Yes: this won't work if you have more than one instance of the
> application running.
>
> >> We've have a few instances where the 15 minute wait has happened
> >> inside of the synchronized block of code, meaning that several
> >> other threads are also having to wait for the 15 minutes to end
> >> before they get their turn inside that block of code.  Our
> >> website analyzer (YourKit) can detect this blocked threads as
> >> deadlocks, and colors them red in the thread graph.
>
> In those cases, the sync block would not have occurred in the
> SocketInputStream, though... it would have occurred in your code
> somewhere. YourKit displays threads waiting on a monitor in red, but
> waiting on a monitor does not itself indicate deadlock.
>
> Deadlock occurs when two (or more) threads are holding locks against
> each other such that none of the threads can ever proceed. If you use
> "synchronized" then this is easy to do because "synchronized" doesn't
> allow your code to attempt to acquire a lock and give up after a
> certain period of time. If you want that kind of behavior, you need to
> look at using a different kind of lock (like java.util.concurrent.Lock).
>
> >>>> It seems that sometime around the hour mark, connections get
> >>>> dropped, so we're thinking that either adding idle checking
> >>>> or dropping old connections may help us avoid this.  Although
> >>>> we are a little concerned by the various Connector / J
> >>>> alleged socket read issues which may as a possible problem.
> >
> > I don't think you should blame Connector/J at this point. They may
> > have ClassLoader pinning issues (don't get me started), but the
> > driver is fairly robust and mature.
> >
> >>>> We're running an older 5.1.18 version of the Connector/J
> >>>> driver, but aren't sure of moving to the latest .34 release
> >>>> would change anything.
> >
> > We are also still using 5.1.18 and have never had any of these
> > kinds of issues. I would highly suspect the network environment.
> > See what you can find out by tinkering with the firewall and db
> > idle policies. You may find that the pipe across the network gets
> > into a state where the client is sure the connection is still
> > valid, but it's simply never going to return any data. In that
> > case, you'll need to figure out how to have that connection fail
> > faster.
> >
> > Do you have a read-timeout set on your driver?
> >
> >> We have re-configured the driver by setting maxAge to 360000,
> >> testWhileIdle to true, validationQueryTimeout to 5,
> >> logValidationErrors to true.  We restarted the Tomcat service 5
> >> 1/2 hours ago, and so far haven't seen this error return yet.  We
> >> are also hopeful that even if it did, the validationQueryTimeout
> >> change would prevent it from holding things up for 15 minutes.
> >
> >> We really appreciate all of the feedback from you and others
> >> today, since we've focused most of our dev team on trying to
> >> understand and troubleshoot this issue for the past few days.
>
> Good luck. This kind of thing can drive you insane.
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: GPGTools - http://gpgtools.org
>
> iQIcBAEBCAAGBQJUtrj5AAoJEBzwKT+lPKRYw3UQAKJcXsyPM14QOIsvJh1V5vGU
> +TQBHLPAfezyhSrEg1+vKEpWF/9VvXlzwoj8KEFgfWTCUogfKUT1kyLN2GCb3k6s
> Cl04yndqqdb/N1ORsoTG9/utLdktOCgDds7qwLtsS9h4gUGME/wXIJSsazBAAlv4
> +dkJfVgzyVrizGEIt3s1uVz+X2VOfU/4FVaGwvbsU6hBgVBjfmZH6bwuy+6I3D63
> nxTePgK+BILW3pPgd3l46z+gBeaHCPEcjnMmdaZSc8iftzh2udT0HGMrSf/Fso0V
> SFnQj0qVe+nV6WuqExcQBrgvCXh0fdUb74YXfD1UFiZFjAxKY1+B3BDtMIdTizSL
> byAPfC/XRnWPzJa+yqO64nINe2Yhdo/BhOEFAkLl7ds7RLD3qwEwIQ70iwZd0vXR
> RqKpQ/vhtfgRDEvpFmHpGKXnn8LpDv0InJG8mEwyI+2hVpwwGa3E2fOvgYijeybl
> IPTuCaqvsaH8wToEuw64k75On2HDzFBIc9YvWBC6i5gZiIDIdDl+ehjZx8iBIeyJ
> 7MAIK8UL9OqXVVDBSD/rFhGD4TsQpi6NvLEwiibz0faQWl2AXEoabxd4zJvwbP3U
> 2lf1KUs7lWtj/yoK6PSGFHjOBiyZe+z+RdjUpUSqnGB97wUKIP0bSadS7xtziSvX
> zA3Y1KuirLdEsynDcQu+
> =LqcN
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

Re: Occasional long wait for a JDBC connection

Reply via email to