On 05/02/2016 09:00, Mark Thomas wrote:
> On 05/02/2016 00:30, Jeroen van Ooststroom wrote:
>> Hello,
>>
>> Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11
>> we are getting a stuck thread issue when our server is on high(er) load.
>> It seems to happen when one of our non-Container Threads invokes the
>> AsyncContext.complete() method while the AsyncStateMachine class’ state
>> class member is set to STARTING instead of STARTED, resulting in the
>> invocation of the Object’s wait() method in the
>> pauseNonContainerThread() method. It never seems to recover from this.
>>
>> I tried a couple of things within our source code trying to get around
>> this:
>>
>> 1. If HttpServletRequest.isAsyncStarted() returns true invoke
>>    AsyncContext.complete(), but if it returns false
> 
> That should not be possible. And I think I have figured out what the
> problem is. Let me explain.
> 
> The expected sequence of processing is:
> 
> 0) Servlet.service() -> entry
> 1) Servlet.service() -> call startAsync()
> 2) Servlet.service() -> start a non-container thread to do some
>    processing
> 3) Servlet.service() -> exits
> 4) non-container thread calls complete()
> 
> Each connection has a dedicated async state machine that tracks the
> current state of the connection.
> 
> At point 0 the state machine is in the state DISPATCHED (a.k.a. not async)
> At point 1 the state machine transitions to STARTING.
> At point 3 the state machine transitions to STARTED.
> At point 4 the state machine transitions to COMPLETING, a container
> thread is assigned which fires the onComplete() event for any listeners
> and then changes state to DISPATCHED.
> 
> The point of the waits is to ensure that step 4) always occurs after
> step 3). If a non-container thread is paused, the aynsc post-processing
> code for container threads will unpause the thread once it completes.
> 
> If a non-container thread calls complete() at an inappropriate time then
> an IllegalStateExcpetion should be thrown.
> 
> The problem is that the non-container thread is paused before the check
> for the illegal state is made. Therefore, in the case where the
> non-container thread is making a call at an inappropriate time, it is
> paused and will never be unpaused. We need to check the state is valid
> before pausing the thread.

Drat. This is already checked and I missed it when reviewing the code.

That narrows the possible causes down to:
- The container thread never exits the Servlet.service() method. A
thread dump will confirm or eliminate that possibility.
- The container thread experiences an error which causes it not to
unpause the non-container thread. I've been over the code several times
and short of an OOME, ThreadDeath or similar error I don't see how this
could happen.

It would be worth checking the logs before the point where this problem
occurs for any errors.

I'm going to review the code again to see if I can see any other
potential problems.

One option is to refactor the pause method and check periodically if the
thread should still be paused. It might be worth adding some debug
logging to get a better idea of what is going on when this issue occurs.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to