Hi -- we've recently had problems with Tomcat on one server getting overloaded and then refusing further requests. What happens is that more and more request-processing threads get stuck on some slow operation (eg. database query, opening a socket to another server), until eventually Tomcat's thread pool is exhausted and it starts responding with "503 Service temporarily unavailable".
Now, obviously the *real* fix is to get to the root of things and figure out why those requests are slow (or blocked, or whatever). But this keeps happening for various different reasons, and when it does we get angry customers on the line wanting to know what the hell happened to the web server. (We deploy a suite of related web applications on dedicated servers to a hundred or so customers; one day a deadlock will affect customer X, a few weeks later a network outage between servers will affect customer Y, and so on.) So I want to implement some sort of automatic load monitoring of Tomcat. This should give us a rough idea of when things are about to go bad *before* the customer even finds out about it (never mind picks up the phone) -- and it's independent of what the underlying cause is. Ideally, I'd like to know if the number of concurrent requests goes above X for Y minutes, and raise the alarm if so. This is across *all* webapps running in the same container. I've implemented a vile hack that hits Tomcat's process with SIGQUIT to trigger a thread dump, then parses the thread dump to look for threads that the JVM says are "runnable". E.g. this: "Ajp13Processor[8009][33]" daemon prio=1 tid=0x0856a528 nid=0x2263 in Object.wait() [99bc4000..99bc487c] is presumed to be an idle thread (it's waiting on a org.apache.ajp.tomcat4.Ajp13Processor monitor). But this: "Ajp13Processor[8009][28]" daemon prio=1 tid=0x0856c6d8 nid=0x2263 runnable [99dc7000..99dc887c] is presumed to be processing a request (it's deep in the bowels of our JDBC driver, reading from the database server ... which is what most of our requests seem to spend most of their time doing). This seems to work and it gives a rough-and-ready snapshot of how busy Tomcat is at the moment. If I run it every 60 sec for a while, I get output like this: /var/log/tomcat/thread-dump-20060925_112753.log: 20/34 /var/log/tomcat/thread-dump-20060925_112858.log: 17/34 /var/log/tomcat/thread-dump-20060925_113003.log: 20/34 /var/log/tomcat/thread-dump-20060925_113109.log: 20/34 /var/log/tomcat/thread-dump-20060925_113214.log: 18/34 /var/log/tomcat/thread-dump-20060925_113319.log: 21/34 where the first number is the count of "runnable" Ajp13Processor threads (ie. concurrent requests) and the second number is the total number of Ajp13Processors. I have two concerns about this vile hack: * well, it *is* a vile hack -- is there a cleaner way to get this information out of Tomcat? (keeping in mind that we're running 4.1.30) * just how hard on the JVM is it to get a thread dump? I would probably decrease the frequency to every 10 min in production, but even so it makes me a bit nervous. Thanks -- Greg --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]