A brief look through "svn log http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/ha/session/DeltaRequest.java" turns up this: ------------------------------------------------------------------------ r618823 | fhanik | 2008-02-06 07:29:56 +0800 (Wed, 06 Feb 2008) | 3 lines
Remove synchronization on the DeltaRequest object, and let the object that manages the delta request (session/manager) to handle the locking properly, using the session lock There is a case with a non sticky load balancer where using synchronized and a lock (essentially two locks) can end up in a dead lock ------------------------------------------------------------------------ This is the only one where the commit comments seem to indicate anything related to my issue. Given that 6.0.14 was released on 14 Aug 2007 ( http://www.mail-archive.com/annou...@apache.org/msg00386.html), it may be applicable. Would just like to know your opinion, is it likely that this is the issue I'm facing? Thanks! Wong On Wed, Aug 26, 2009 at 8:48 AM, CS Wong <lilw...@gmail.com> wrote: > Thanks, Filip. > I'm running 6.0.14 right now. Would you have any idea whether any changes > in the code since then would have fixed something like this? I can try to > push for an upgrade to 6.0.20 but the app owners would probably want to know > whether it would be fixed for sure since they have to go through a rather > troublesome round of testing which takes up quite a bit of time. It helps > that they know that the problem won't reoccur once this has been done. > > Thanks, > Wong > > > On Tue, Aug 25, 2009 at 11:35 PM, Filip Hanik - Dev Lists < > devli...@hanik.com> wrote: > >> I've taken a look at the code. >> The fix for this is easy, but it doesn't explain why it happens. This is a >> concurrency issue, but if you're not running the latest tomcat version, then >> it could already have been fixed. >> >> best >> Filip >> >> >> On 08/25/2009 01:55 AM, CS Wong wrote: >> >>> Hi Michael, >>> The logs are the bit that went haywire. The applications at this point >>> still >>> work but often, there's not enough time to troubleshoot much else. The >>> logs >>> can increase by 5-6GB in a matter of an hour or so and hence, we often >>> just >>> kill the service (normal shutdown.sh doesn't respond any more at this >>> point, >>> we have to kill -9 it) in panic and delete the logs before the entire >>> server >>> goes kaboom. This time, I managed to tail out some of the logs, for which >>> I >>> pasted an extract (same repeating pattern of errors): >>> >>> Aug 25, 2009 11:44:02 AM org.apache.catalina.ha.session.DeltaRequest >>> reset >>> SEVERE: Unable to remove element >>> java.util.NoSuchElementException >>> at java.util.LinkedList.remove(LinkedList.java:788) >>> at java.util.LinkedList.removeFirst(LinkedList.java:134) >>> at >>> org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201) >>> at >>> >>> org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195) >>> at >>> >>> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364) >>> at >>> >>> org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320) >>> at >>> >>> org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083) >>> at >>> >>> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87) >>> at >>> >>> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:916) >>> at >>> >>> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:897) >>> at >>> >>> org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:264) >>> at >>> >>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>> at >>> >>> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:110) >>> at >>> >>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>> at >>> >>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>> at >>> >>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>> at >>> >>> org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:241) >>> at >>> >>> org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:225) >>> at >>> >>> org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:188) >>> at >>> >>> org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:91) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >>> at java.lang.Thread.run(Thread.java:619) >>> >>> Wong >>> >>> >>> >>> On Tue, Aug 25, 2009 at 3:36 PM, Michael Ludwig<m...@as-guides.com> >>> wrote: >>> >>> >>> >>>> CS Wong schrieb: >>>> >>>> >>>> >>>>> Periodically, I'm getting problems with my Tomcat 6 cluster (2 nodes). >>>>> One of the nodes would just go haywire >>>>> >>>>> >>>>> >>>> Could you elaborate on what "going haywire" means? >>>> >>>> >>> >>> >>> >>> >>> >>> >>>> Below, you write: >>>> >>>> [The NoSuchElementException is] the only thing that it shows. The >>>> >>>> >>>>> other node in the cluster is still active at this time. There's >>>>> nothing to do but to restart. The large amount of logs has caused >>>>> disk space issues more than a couple of times too. >>>>> >>>>> >>>>> >>>> So is that server not active any more? Unresponsive? Hyperactive writing >>>> to the log file? Looping? >>>> >>>> and generate a ton of logs repeating the following: >>>> >>>> >>>>> Aug 25, 2009 11:44:10 AM org.apache.catalina.ha.session.DeltaRequest >>>>> reset >>>>> SEVERE: Unable to remove element >>>>> java.util.NoSuchElementException >>>>> at java.util.LinkedList.remove(LinkedList.java:788) >>>>> at java.util.LinkedList.removeFirst(LinkedList.java:134) >>>>> at >>>>> >>>>> org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201) >>>>> at >>>>> >>>>> org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195) >>>>> at >>>>> >>>>> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364) >>>>> at >>>>> >>>>> org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320) >>>>> at >>>>> >>>>> org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083) >>>>> at >>>>> >>>>> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87) >>>>> >>>>> >>>>> >>>> I only found this, which seems to have led you here: >>>> >>>> http://stackoverflow.com/questions/1326336/ >>>> >>>> Maybe it is helpful to others who know about Tomcat internals. >>>> >>>> -- >>>> Michael Ludwig >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >>>> For additional commands, e-mail: users-h...@tomcat.apache.org >>>> >>>> >>>> >>>> >>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> >