[ https://issues.jenkins-ci.org/browse/JENKINS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=160958#comment-160958 ]
Uwe Stuehler edited comment on JENKINS-9882 at 3/29/12 3:09 PM: ---------------------------------------------------------------- It seems to me that this bug might not have been solved completely, or it is a different but similar issue we're seeing. After about a day of normal operation, we see connections via HTTP and AJP being accept()ed and that the thread which is accepting these connections is then going back to poll()... And then _nothing_ happens with the new connection, not a single read() or anything. New threads aren't created either. We're at exactly 200 "RequestHandlerThread" threads in state RUNNABLE. Is 200 a fixed limit on the number of request handler threads? *Edit*: added stack trace All RequestHandlerThread threads have this exact same backtrace: {noformat} "RequestHandlerThread[#871]" daemon prio=10 tid=0x00007fee8527a800 nid=0xf2c runnable [0x00007fee7d493000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at winstone.ajp13.Ajp13IncomingPacket.<init>(Ajp13IncomingPacket.java:60) at winstone.ajp13.Ajp13Listener.allocateRequestResponse(Ajp13Listener.java:170) at winstone.RequestHandlerThread.run(RequestHandlerThread.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} was (Author: ustuehler): It seems to me that this bug might not have been solved completely, or it is a different but similar issue we're seeing. After about a day of normal operation, we see connections via HTTP and AJP being accept()ed and that the thread which is accepting these connections is then going back to poll()... And then _nothing_ happens with the new connection, not a single read() or anything. New threads aren't created either. We're at exactly 200 "RequestHandlerThread" threads in state RUNNABLE. Is 200 a fixed limit on the number of request handler threads? *Edit*: added stack trace {noformat} "RequestHandlerThread[#871]" daemon prio=10 tid=0x00007fee8527a800 nid=0xf2c runnable [0x00007fee7d493000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at winstone.ajp13.Ajp13IncomingPacket.<init>(Ajp13IncomingPacket.java:60) at winstone.ajp13.Ajp13Listener.allocateRequestResponse(Ajp13Listener.java:170) at winstone.RequestHandlerThread.run(RequestHandlerThread.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} > Jenkins runs out of file descriptors (winstone problem) > ------------------------------------------------------- > > Key: JENKINS-9882 > URL: https://issues.jenkins-ci.org/browse/JENKINS-9882 > Project: Jenkins > Issue Type: Bug > Components: core > Affects Versions: current > Environment: Environment: Debian 5, sun-java-jdk (1.6.0_22) > Jenkins version: 1.414-SNAPSHOT > Reporter: Santeri Paavolainen > > Running Jenkins with the embedded Winstone server for a long time > under constant load conditions causes file descriptor and thread > leakage. > Environment: Debian 5, sun-java-jdk (1.6.0_22) > Jenkins version: 1.414-SNAPSHOT > What happens: > After running for about 1 day the following appears on jenkins log > file: > [Winstone 2011/05/27 07:35:03] - WARNING: Request handler pool limit exceeded > - waiting for retry > and a bit later (this starts repeating): > [Winstone 2011/05/27 07:43:25] - WARNING: Request handler pool limit exceeded > - waiting for retry > [Winstone 2011/05/27 07:43:26] - ERROR: Request ignored because there were no > more request handlers available in the pool > [Winstone 2011/05/27 07:43:36] - WARNING: Request handler pool limit exceeded > - waiting for retry > [Winstone 2011/05/27 07:43:37] - ERROR: Request ignored because there were no > more request handlers available in the pool > Jenkins then stops handling requests successfully - at the beginning > intermittently, but finally basically failing almost all of the > requests. > Using VisualVM I can see that there is a thousand RequestHandlerThread > threads in wait state, and that over 1200 file descriptors are > currently in use. > I think the requests start failing because winstone has a this limit: > private int MAX_REQUEST_HANDLERS_IN_POOL = 1000; > as it doesn't seem to be running out of available fds (apparently 8192 > is the maximum in this setup). > When I restart jenkins I can verify a slow buildup of threads and used > file descriptors: > * 10 minutes after restart: 136 live threads, 256 fds used > * 20 minutes: 150 threads, 271 fds > * 30 minutes: 161 threads, 280 fds > * 110 minutes: 255 threads, 376 fds > I've looked at the repository version of winstone, and looking at the > code there seems to be a race condition in handling of the request > handler pool. > When a request is received by ObjectPool.handleRequest, it looks for > an available request handler from unusedRequestHandlerThreads and > calls commenceRequestHandling on the available thread. > commenceRequestHandling in turn does this.notifyAll() to wake up the > thread. So far so good. However when the thread has finished > processing the request, it calls > this.objectPool.releaseRequestHandler(this) and *then* waits. I think > here's a race condition, since what can happen is that object pool > called (CALL) and request handler thread (RH) can interleave like > this: > # RH (in RequestHandler.run): this.objectPool.releaseRequestHandler(this) > # RH (in ObjectPool.releaseRequestHandler): > this.unusedRequestHandlerThreads.add(rh) > # CALL (in ObjectPool.handleRequest): take RH from unusedRequestHandlerThreads > # CALL (in ObjectPool.handleRequest): rh.commenceRequestHandling(socket, > listener); > # CALL (in RequestHandler.commenceRequestHandling): this.notifyAll() > # RH (in ObjectPool.run): this.wait() > Since notify is lost (no waiters), this.wait() in the last step will > hang forever. This will leak a file descriptor since the socket given > to be processed is never reclaimed, and threads are effectively lost > as Winstone will then create more RequestHandlers. > Now, this is of course a winstone problem, but its development seems > to be d-e-a-d at least looking at its bug tracker. As long as this > problem affect Jenkins, I'd still classify it as a Jenkins problem too. > I've put this into the winstone tracker too: > https://sourceforge.net/tracker/?func=detail&aid=3308285&group_id=98922&atid=622497 > Workaround: Use Tomcat, not embedded winstone (that's what I'm doing now). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira