OK, I know this seems crazy, but I've looked long and hard and cannot explain this as other than a Tomcat bug. I'm more than willing to dig up extra information where I can, but this is a very rare problem seen in production and not reproducible at will. Also, because this is in production, there is very little I can change, and even if I did change it, it would take a month or more to verify that it is really a fix. So what I'm really looking for is a detailed understanding of the problem so we can detect it when it happens and clean up the after-effects.
What we are seeing is one POST request being handled twice by our web app. It looks as though somehow the request is duplicated by Tomcat. Here's our setup: Apache 2.2.14 using APR 1.3.8 and mod_jk 1.2.28 running on Ubuntu 10.04.3 as front end, load balancing. Tomcat 6.0.29 using built-in AJP/1.3 connector running on Ubuntu 9.04 with 64-bit Sun Java 1.5.0_19. Tomcat is not using APR. Tomcat mod_jk connector has keepAliveTimeout=5000 and connectionTimeout=60000 (60 seconds). Here is a simplified timeline of what we see in various logs. I'm going to leave some fields out and change others to protect confidentiality and make it easier to follow, but I'll try to keep from changing anything important. Timestamps are just minutes and seconds. 51:48 POST request received by Apache over SSL. We see the request logged in Apache's access.log with this timestamp, but the log entry is actually written at 52:08 (or seems to be, since that is the time stamp of the next entry in the access.log, though the entries immediately before this one have timestamps of 52:02). 51:48 log4j application logs show processing of the request by the application in thread TP-Processor18. Processing continues with updates to database and external systems until: 51:55 last application log entry from TP-Processor18 shows normal completion of processing, which should then return from handleRequest(). The database and three external systems the app communicates with all show normal processing. 51:55 log4j application log entries from TP-Processor6 show it beginning to handle the request. Processing continues normally, just as it did in TP-Processor18 until: 52:08 last application log entry from TP-Processor6. Again external systems show normal processing. 52:08 Tomcat access log shows entry for post request stating it was handled by TP-Processor6 in 7283ms 52:08 mod_jk.log: loadbalancer www.site.com 19.400742 53:06 Tomcat access log shows entry for post request stating it was handled by TP-Processor18 in 77440ms How is it possible that Tomcat has 2 threads handling the same request? How can we detect this as opposed to the user legitimately sending the same request twice? Thanks for your help! =Jeremy=