The timeout happens in your SocketRead, this is configurable (default is forever) http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html
what appears to be happening is that somewhere there isn't a reset packet sent from the server to the JDBC driver. Setting a timeout may alleviate this. This property is not the same as query timeout, this is simply a timeout in between packets. but if you have long running queries, this may affect it. *socketTimeout* On Wed, Jan 14, 2015 at 11:44 AM, Christopher Schultz < ch...@christopherschultz.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Darren, > > On 1/13/15 11:32 PM, Darren Davis wrote: > > On Tue, Jan 13, 2015 at 8:39 PM, Christopher Schultz < > > ch...@christopherschultz.net> wrote: > > > > Darren, > > > > (Sorry... just had to remove that monstrous stack trace...) > > > > On 1/13/15 5:04 PM, Darren Davis wrote: > >>>> Hi Christopher. Yes, we've tried a show process list and can > >>>> find no evidence of the validation query running on mysql. > > > > Strange. Maybe you are waiting for the db server's buffer to flush > > or something like that. > > > > > >> I think this is because the client thinks it still has an open > >> connection, the client net stat command shows an open connection > >> over port 3306, at least for a few minutes after it's killed by > >> the load balancer. The Server loses its connection in netstat > >> immediately. > > > > > >>>> We also just tried an experiment outside of Tomcat > >>>> completely, but connecting to a downed web server host and > >>>> manually opening up a mysql client connection to the data > >>>> server and executing a single command. > >>>> > >>>> We left that client window idle for an hour and 5 minutes, > >>>> and attempted to execute a simple select count(*) command > >>>> against a tiny table. The client attempted to execute the > >>>> query, and a NetStat on that box showed an open connection > >>>> between the two servers using port 3306. We also checked the > >>>> process list during this time and could not find any queries > >>>> at all from the sever in question. > >>>> > >>>> At about the 15 minute wait mark, the client finally came > >>>> back with this message: "ERROR 2013 (HY000): Lost connection > >>>> to MySQL server during query. > > > > Was this with the MySQL command-line client? What query did you > > issue ("SELECT 1")? > > > > > >> Yes, it was just the command line client, and we issued a select > >> count(*) from a table with a couple rows in it. > > > > > >>>> Attempting the execute the command a 2nd time (using the up > >>>> arrow), re-established the connection and it ran perfectly in > >>>> a few milliseconds. > > > > That's interesting. I've never experienced anything like that with > > MySQL, but we use a VLAN between our application and database > > servers with no hardware firewall, so we don't have any connection > > timeout problems. Also, when connections are dropped due to > > inactivity, they re-connect without any problems. > > > >>>> I checked the mysql configuration and it is set to the > >>>> default values for keeping connections/interactive > >>>> connections open (for 8 hours), so it seems that maybe the > >>>> Cisco firewall between the two servers is terminating > >>>> connections out from under us, but in a way what the O/S > >>>> cannot detect it. > > > > What if you set that idle connection timeout to something like 5 > > minutes? Can you reproduce this issue more quickly? Can you look > > at the fw configuration to see if you can change the idle timeout > > /down/ to something more testable? > > > > As part of our move to the new versions of Tomcat/Java, we are in a > > new > >> cloud environment which features a different type of firewall. > >> The provider confirmed to us late today, that it is configured to > >> kill "idle" TCP connections after an hour, which is something our > >> old firewall didn't do. > > > >> Because we sometimes have low traffic during this time of the > >> year, especially on the weekends, what we think is happening is > >> that one or more of the minimum 10 connections is going unused > >> for more than an hour, and since we didn't have any connection > >> testing while idle turned on, they were being killed by the > >> firewall out from under the pool, and depending on how soon they > >> were used after that, we would run into the 15 minute delay > >> before they were deemed lost, and replaced with a new > >> connection. > > This should be entirely possible. That's the point of the > connection-validation operation (whether done by an actual query or > not). The question is why the connection is being dropped in a way > that is thwarting the connection-validation at all. It may come down > to some kind of OS-level setting, or a slightly different > configuration on the firewall. > > It seems that removing the firewall's idle-connection policy would be > an easy way to try to get around this issue at least temporarily. > > >>>> I've also fired up the yourKit profiler on this box and am > >>>> seeing other threads which have had to wait in the same > >>>> SocketInputStream.read code, all three started a few seconds > >>>> apart, it just wasn't detected as a deadlock, because it took > >>>> place outside of any synchronized methods. > > > > What makes you think it's deadlock? Deadlock is a very specific > > thing. Just because many threads are waiting in > > SocketInputStream.read doesn't mean there are any threading issues > > at all. I suspect that each SocketInputStream is distinct and only > > in use by a single thread. The threads are blocked on I/O, right? > > So they aren't waiting on a monitor. The best you could do would be > > to find the native file descriptor for each socket and determine > > that they are different from each other. I would be very surprised > > if they are the same, used across threads. If you *are* using > > Connection objects across threads, you should be very careful. > > Connection objects ought to be threadsafe (I think) but use of > > Statement and ResultSet objects across threads is a terrible idea. > > > >> We have a couple of synchronized methods in two of our services > >> which hold locks in order to update a centralized record. We > >> realize this is a bad design and are already working on > >> re-factoring this code to remove this need. > > Yes: this won't work if you have more than one instance of the > application running. > > >> We've have a few instances where the 15 minute wait has happened > >> inside of the synchronized block of code, meaning that several > >> other threads are also having to wait for the 15 minutes to end > >> before they get their turn inside that block of code. Our > >> website analyzer (YourKit) can detect this blocked threads as > >> deadlocks, and colors them red in the thread graph. > > In those cases, the sync block would not have occurred in the > SocketInputStream, though... it would have occurred in your code > somewhere. YourKit displays threads waiting on a monitor in red, but > waiting on a monitor does not itself indicate deadlock. > > Deadlock occurs when two (or more) threads are holding locks against > each other such that none of the threads can ever proceed. If you use > "synchronized" then this is easy to do because "synchronized" doesn't > allow your code to attempt to acquire a lock and give up after a > certain period of time. If you want that kind of behavior, you need to > look at using a different kind of lock (like java.util.concurrent.Lock). > > >>>> It seems that sometime around the hour mark, connections get > >>>> dropped, so we're thinking that either adding idle checking > >>>> or dropping old connections may help us avoid this. Although > >>>> we are a little concerned by the various Connector / J > >>>> alleged socket read issues which may as a possible problem. > > > > I don't think you should blame Connector/J at this point. They may > > have ClassLoader pinning issues (don't get me started), but the > > driver is fairly robust and mature. > > > >>>> We're running an older 5.1.18 version of the Connector/J > >>>> driver, but aren't sure of moving to the latest .34 release > >>>> would change anything. > > > > We are also still using 5.1.18 and have never had any of these > > kinds of issues. I would highly suspect the network environment. > > See what you can find out by tinkering with the firewall and db > > idle policies. You may find that the pipe across the network gets > > into a state where the client is sure the connection is still > > valid, but it's simply never going to return any data. In that > > case, you'll need to figure out how to have that connection fail > > faster. > > > > Do you have a read-timeout set on your driver? > > > >> We have re-configured the driver by setting maxAge to 360000, > >> testWhileIdle to true, validationQueryTimeout to 5, > >> logValidationErrors to true. We restarted the Tomcat service 5 > >> 1/2 hours ago, and so far haven't seen this error return yet. We > >> are also hopeful that even if it did, the validationQueryTimeout > >> change would prevent it from holding things up for 15 minutes. > > > >> We really appreciate all of the feedback from you and others > >> today, since we've focused most of our dev team on trying to > >> understand and troubleshoot this issue for the past few days. > > Good luck. This kind of thing can drive you insane. > > - -chris > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > Comment: GPGTools - http://gpgtools.org > > iQIcBAEBCAAGBQJUtrj5AAoJEBzwKT+lPKRYw3UQAKJcXsyPM14QOIsvJh1V5vGU > +TQBHLPAfezyhSrEg1+vKEpWF/9VvXlzwoj8KEFgfWTCUogfKUT1kyLN2GCb3k6s > Cl04yndqqdb/N1ORsoTG9/utLdktOCgDds7qwLtsS9h4gUGME/wXIJSsazBAAlv4 > +dkJfVgzyVrizGEIt3s1uVz+X2VOfU/4FVaGwvbsU6hBgVBjfmZH6bwuy+6I3D63 > nxTePgK+BILW3pPgd3l46z+gBeaHCPEcjnMmdaZSc8iftzh2udT0HGMrSf/Fso0V > SFnQj0qVe+nV6WuqExcQBrgvCXh0fdUb74YXfD1UFiZFjAxKY1+B3BDtMIdTizSL > byAPfC/XRnWPzJa+yqO64nINe2Yhdo/BhOEFAkLl7ds7RLD3qwEwIQ70iwZd0vXR > RqKpQ/vhtfgRDEvpFmHpGKXnn8LpDv0InJG8mEwyI+2hVpwwGa3E2fOvgYijeybl > IPTuCaqvsaH8wToEuw64k75On2HDzFBIc9YvWBC6i5gZiIDIdDl+ehjZx8iBIeyJ > 7MAIK8UL9OqXVVDBSD/rFhGD4TsQpi6NvLEwiibz0faQWl2AXEoabxd4zJvwbP3U > 2lf1KUs7lWtj/yoK6PSGFHjOBiyZe+z+RdjUpUSqnGB97wUKIP0bSadS7xtziSvX > zA3Y1KuirLdEsynDcQu+ > =LqcN > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >