Hi Tim,
[EMAIL PROTECTED] wrote:
But now I have a load balancer in place, with one worker. A status is
shown below after about 6hrs of traffic. As you can see the client
errors make up about 6% of all incident traffic, though they dwarf
server errors.
Acc Err CE RE Wr Rd Busy Max
>42245 97 2556 0 24M 1.4G 18 60
OK, first of all we have 42245 requests forwarded since last restart of
the web server, if this is 6 hours, it's about 2 requests per second
(mean value, not peak). Your busyness (see also below) is 18. If your
busyness is often that high, it means that your medium response time
should be something like
busyness / load = 18 requests / (2 requests per second) = 9 seconds.
2556 client errors, i.e. either the request could not be read
completely, or more likely the response could not be returned completely
is pretty much (more than 5%). Also: 97 errors really mean, that 97
times there was a serious problem between the web server and tomcat. We
don't know though, how often there was a rot cause. Maybe it was only a
problem once, and 97 requests ran into it very quickly after each other.
But then just as I'm writing this my site falls over - I get the
Service Temporarily Unavailable consistently on my pages. I grabbed a
snapshot of the status worker -
Good, so you know at least, that the web server is still able to respond :)
State Acc Err CE RE Wr Rd Busy Max
ERR/FRC 43181 691 2597 0 24M 1.4G 100 115
OK, so the plugin detected an error ("ERR"), when talking to Tomcat.
Details on this error should be in the log file. Since the load balancer
only has one member, it doesn't take the only member out of service
(what it would do, if it had more members), instead it does forced
recovery ("FRC"), i.e. it still sends requests there, although it failed
with the requests before.
We can see, that from the 936 requests forwarded since the last
snapshot, 691 ran into an error, and from the remaining 245 requests 39
into got a client error, even more than the 5% we had over 6 hours.
The most important thing though is, that the busyness went up to 100
with a max of 115. Busyness is the number of requests, which are
currently being processed by the backend (more precisely those, that
have been forwarded by this web server and not fully returned yet).
The fact, that your busyness went up that far usually means, that
something in your backend got very slow. If this is true, and what is
slow, can be analyzed best using java thread dumps of the backend process.
Clearly there seemed to be some catastrophic occurrence that made the
Error count rocket and the worker state change. I'm unfamliar with
the load balancer - will a state of ERR/FRC be rectified somehow? For
now I just restarted IIS & Tomcat which appears to be the only method
of recovery at present
To check the theory, that the problem lies within the backend, and not
IIS, you could try to find out what's happening, if you only restart
tomcat. If it's really the backend, the isapi redirector should be able
to forward traffic again without IIS restart.
In order to prevent issues from long TC timeouts when you shutdown the
backend, you could first take the status of the backend to "Stop" in the
load balancer. Then the load balancer will immediately answer all
requests with an error and not stack them up in frnt of the backend.
Then resart Tomcat and after full Tomcat startup, take the load balancer
again to "Active". If your service runs again, then it's likely a
backend problem.
Diagnosing without thread dumps might be hard. Have a look at the log files.
cheers
Tim
Regards,
Rainer
---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]