We've been using Tomcat 4.1.30 happily for a couple of years now, but every so often one badly-behaved webapp can make life unhappy for everyone living in the container. (Our Tomcat deployment is part of a suite of applications that run on a small cluster of Linux servers; all of the webapps running inside Tomcat are written and controlled by us. We have around a hundred of these small clusters deployed worldwide, so several hundred servers all told.)
Here's what typically happens: * webapp A tries to open a database connection to another server in the cluster, but that server is down and packets to it just disappear (alternately, A runs a badly-written and consequently very s-l-o-w query: either way, it's a database operation that takes a looooong time) * meanwhile, the thread running that request for A is holding a synchronization lock: yes, we know that you're not supposed to hold synchronization locks while doing I/O, but the programmers who wrote this stuff 3-5 years ago did not know that. We fix the bugs as we find them, but they aren't easy to find and they aren't easy to fix. * thus, all requests to A backup in a queue waiting for the original thread to finish its slow I/O and release that synchronization lock. If there are enough incoming requests for A, then Tomcat's thread pool is gradually exhausted, eventually allocating all 75 threads to process requests for A that are blocked by that one synchronization lock. * now Tomcat is unable to process requests for webapps B, C, D, .... and our whole application suite is effectively dead. Oops! Obviously, the right long-term fix is "don't hold synchronization locks while doing database I/O". (It would also help if database connections and queries were always fast, but alas! life just doesn't work that way.) But until all those bugs are found and fixed, this cascading failure is going to happen occasionally. One idea that has occurred to me is to limit the number of threads Tomcat allocates to any one webapp. Say we could limit webapp A to 25 threads from Tomcat's pool of 75: users depending on A would still be shut out (all requests block), but that failure would not cascade out to affect all other webapps running in the same container. So I'm wondering: * is there an easy way to implement this with Tomcat 4.1? how about 5.5? (we haven't upgraded because we're pretty happy with 4.1 ... but if there's a compelling reason to switch to 5.5, we'll do it) * are there other good techniques for limiting the damage caused by badly-behaved webapps? I'm sure "holding synchronization lock while doing database I/O" is only one type of bad behaviour lurking in our code ... I'd like to reduce the effect webapps in the same container have on each other as much as possible. Thanks -- Greg --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]