Limiting effects of badly-behaved webapps

Greg Ward Tue, 02 May 2006 07:44:21 -0700

We've been using Tomcat 4.1.30 happily for a couple of years now, but
every so often one badly-behaved webapp can make life unhappy for
everyone living in the container.  (Our Tomcat deployment is part of a
suite of applications that run on a small cluster of Linux servers; all
of the webapps running inside Tomcat are written and controlled by us.
We have around a hundred of these small clusters deployed worldwide, so
several hundred servers all told.)


Here's what typically happens:

  * webapp A tries to open a database connection to another server in
    the cluster, but that server is down and packets to it just
    disappear (alternately, A runs a badly-written and consequently very
    s-l-o-w query: either way, it's a database operation that takes a
    looooong time)

  * meanwhile, the thread running that request for A is holding a
    synchronization lock: yes, we know that you're not supposed to hold
    synchronization locks while doing I/O, but the programmers who wrote
    this stuff 3-5 years ago did not know that.  We fix the bugs as we
    find them, but they aren't easy to find and they aren't easy to fix.

  * thus, all requests to A backup in a queue waiting for the original
    thread to finish its slow I/O and release that synchronization
    lock.  If there are enough incoming requests for A, then Tomcat's
    thread pool is gradually exhausted, eventually allocating all 75
    threads to process requests for A that are blocked by that one
    synchronization lock.

  * now Tomcat is unable to process requests for webapps B, C, D, ....
    and our whole application suite is effectively dead.  Oops!

Obviously, the right long-term fix is "don't hold synchronization locks
while doing database I/O".  (It would also help if database connections
and queries were always fast, but alas! life just doesn't work that
way.)  But until all those bugs are found and fixed, this cascading
failure is going to happen occasionally.

One idea that has occurred to me is to limit the number of threads
Tomcat allocates to any one webapp.  Say we could limit webapp A to 25
threads from Tomcat's pool of 75: users depending on A would still be
shut out (all requests block), but that failure would not cascade out to
affect all other webapps running in the same container.

So I'm wondering:

  * is there an easy way to implement this with Tomcat 4.1?  how about
    5.5?  (we haven't upgraded because we're pretty happy with 4.1
    ... but if there's a compelling reason to switch to 5.5, we'll do
    it)

  * are there other good techniques for limiting the damage caused
    by badly-behaved webapps?  I'm sure "holding synchronization lock
    while doing database I/O" is only one type of bad behaviour lurking
    in our code ... I'd like to reduce the effect webapps in
    the same container have on each other as much as possible.

Thanks --

        Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Limiting effects of badly-behaved webapps

Reply via email to