Hi Tim,

[EMAIL PROTECTED] wrote:
But now I have a load balancer in place, with one worker. A status is
shown below after about 6hrs of traffic. As you can see the client
errors make up about 6% of all incident traffic, though they dwarf
server errors.

Acc      Err     CE      RE      Wr      Rd      Busy    Max
>42245        97      2556    0       24M     1.4G    18      60

OK, first of all we have 42245 requests forwarded since last restart of
the web server, if this is 6 hours, it's about 2 requests per second
(mean value, not peak). Your busyness (see also below) is 18. If your
busyness is often that high, it means that your medium response time
should be something like

busyness / load = 18 requests / (2 requests per second) = 9 seconds.

2556 client errors, i.e. either the request could not be read
completely, or more likely the response could not be returned completely
is pretty much (more than 5%). Also: 97 errors really mean, that 97
times there was a serious problem between the web server and tomcat. We
don't know though, how often there was a rot cause. Maybe it was only a
problem once, and 97 requests ran into it very quickly after each other.

But then just as I'm writing this my site falls over - I get the
Service Temporarily Unavailable consistently on my pages. I grabbed a
snapshot of the status worker -

Good, so you know at least, that the web server is still able to respond :)

State    Acc     Err     CE      RE      Wr      Rd      Busy    Max
ERR/FRC  43181   691     2597    0       24M     1.4G    100     115

OK, so the plugin detected an error ("ERR"), when talking to Tomcat.
Details on this error should be in the log file. Since the load balancer
only has one member, it doesn't take the only member out of service
(what it would do, if it had more members), instead it does forced
recovery ("FRC"), i.e. it still sends requests there, although it failed
with the requests before.

We can see, that from the 936 requests forwarded since the last
snapshot, 691 ran into an error, and from the remaining 245 requests 39
into got a client error, even more than the 5% we had over 6 hours.

The most important thing though is, that the busyness went up to 100
with a max of 115. Busyness is the number of requests, which are
currently being processed by the backend (more precisely those, that
have been forwarded by this web server and not fully returned yet).

The fact, that your busyness went up that far usually means, that
something in your backend got very slow. If this is true, and what is
slow, can be analyzed best using java thread dumps of the backend process.

Clearly there seemed to be some catastrophic occurrence that made the
Error count rocket and the worker state change. I'm unfamliar with
the load balancer - will a state of ERR/FRC be rectified somehow? For
now I just restarted IIS & Tomcat which appears to be the only method
of recovery at present

To check the theory, that the problem lies within the backend, and not IIS, you could try to find out what's happening, if you only restart tomcat. If it's really the backend, the isapi redirector should be able to forward traffic again without IIS restart.

In order to prevent issues from long TC timeouts when you shutdown the backend, you could first take the status of the backend to "Stop" in the load balancer. Then the load balancer will immediately answer all requests with an error and not stack them up in frnt of the backend. Then resart Tomcat and after full Tomcat startup, take the load balancer again to "Active". If your service runs again, then it's likely a backend problem.

Diagnosing without thread dumps might be hard. Have a look at the log files.

cheers


Tim

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to