"Our sites still functions normally with no cpu spikes during this build up until around 60,000 connections, but then the server refuses further connections and a manual Tomcat restart is required."
yes, the connection limit is a 16 bit short count minus some reserved addresses. So your system should become unresponsive, you've run out of ports (the 16 bit value in a TCP connection). netstat -na should give you your connection state when this happens, and that is helpful debug information. Filip On Thu, Jun 19, 2014 at 2:44 PM, André Warnier <a...@ice-sa.com> wrote: > Konstantin Kolinko wrote: > >> 2014-06-19 17:10 GMT+04:00 Lars Engholm Johansen <lar...@gmail.com>: >> >>> I will try to force a GC next time I am at the console about to restart a >>> Tomcat where one of the http-nio-80-ClientPoller-x threads have died and >>> connection count is exploding. >>> >>> But I do not see this as a solution - can you somehow deduct why this >>> thread died from the outcome from a GC? >>> >> >> Nobody said that a thread died because of GC. >> >> The GC that Andre suggested was to get rid of some of CLOSE_WAIT >> connections in netstat output, in case if those are owned by some >> abandoned and non properly closed I/O classes that are still present >> in JVM memory. >> > > Exactly, thanks Konstantin for clarifying. > > I was going per the following in the original post : > > "Our sites still functions normally with no cpu spikes during this build up > until around 60,000 connections, but then the server refuses further > connections and a manual Tomcat restart is required." > > CLOSE_WAIT is a normal state for a TCP connection, but it should not > normally last long. > It indicates basically that the other side has closed the connection, and > that this side should do the same. But it doesn't, and as long as it > doesn't the connection remains in the CLOSE_WAIT state. It's like > "half-closed", but not entirely, and as long as it isn't, the OS cannot get > rid of it. > For a more precise explanation, Google for "TCP CLOSE_WAIT state". > > I have noticed in the past, with some Linux versions, that when the number > of such CLOSE_WAIT connections goes above a certain level (several > hundred), the TCP/IP stack can become totally unresponsive and not accept > any new connections at all, on any port. > In my case, this was due to the following kind of scenario : > Some class Xconnection instantiates an object, and upon creation this > object opens a TCP connection to something. This object is now used as an > "alias" for this connection. Time passes, and finally the object goes out > of scope (e.g. the reference to it is set to "null"), and one may believe > that the underlying connection gets closed as a side-effect. But it > doesn't, not as long as this object is not actually garbage-collected, > which triggers the actual object destruction and the closing of the > underlying connection. > Forcing a GC is a way to provoke this (and restarting Tomcat another, but > more drastic). > > If a forced GC gets rid of your many CLOSE_WAIT connections and makes your > Tomcat operative again, that would be a sign that something similar to the > above is occurring; and then you would need to look in your application for > the oversight. (e.g. the class should have a "close" method (closing the > underlying connection), which should be invoked before letting the object > go out of scope). > > The insidious part is that everything may look fine for a long time (apart > from an occasional long list of CLOSE_WAIT connections). A GC will happen > from time to time (*), which will get rid of these connections. And those > CLOSE_WAIT connections do not consume a lot of resources, so you'll never > notice. > Until at some point, the number of these CLOSE_WAIT connections gets just > at the point where the OS can't swallow any more of them, and then you have > a big problem. > > That sounds a bit like your case, doesn't it ? > > (*) and this is the "insidious squared" part : the smaller the Heap, the > more often a GC will happen, so the sooner these CLOSE_WAIT connections > will disappear. Conversely, by increasing the Heap size, you leave more > time between GCs, and make the problem more likely to happen. > > > I believe that the rest below may be either a consequence, or a red > herring, and I would first eliminate the above as a cause. > > > >> And could an Exception/Error in Tomcat thread http-nio-80-ClientPoller-0 >>> or http-nio-80-ClientPoller-1 make the thread die with no Stacktrace >>> in >>> the Tomcat logs? >>> >>> >> A critical error (java.lang.ThreadDeath, >> java.lang.VirtualMachineError) will cause death of a thread. >> >> A subtype of the latter is java.lang.OutOfMemoryError. >> >> As of now, such errors are passed through and are not logged by >> Tomcat, but are logged by java.lang.ThreadGroup.uncaughtException(). >> ThreadGroup prints them to System.err (catalina.out). >> >> >> Best regards, >> Konstantin Kolinko >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >