Hi Tim,
[EMAIL PROTECTED] wrote:
OK,
So our website keeps crashing over the past couple of weeks (usual
story on this list eh?)
Not really (although a users list is always focused on problems and not
on the working side of things ...)
We've been running JK isapi plugin v1.2.15 for a fair while, but the
isapi redirector log always contains huge numbers of errors being
thrown (see snippet below). We were getting a complete failure of IIS
to serve traffic, solved only by a restart of IIS and Tomcat.
Very recently we moved up to v1.2.25 in the hope of improving
performance but it seems to have little effect- we're still getting
high numbers of 503 responses sent back (maybe 5+ per minute). We do
however now serve static resources from IIS to reduce the use of the
ISAPI calls where possible.
The error number 2746 is hex for 10054, which is a connection reset by
peer winsock error. peer in this case is your IIS client (browser etc.).
Often this is caused by long running requests, where the users press the
retry/again button. Then the browser immediately closes the connection
and uses a new one for the same request. When you are sending back the
response later, the closed connection gets detected and logged.
You should configure logging of response durations to find out, if maybe
you've got a problem with long running requests. You can do that using
the JK request logging, or with the Tomcat access log (add format %D to
your pattern, which is duration milliseconds).
Usually this does *not* mean, that restarting IIS or Tomcat will help.
Concerning Tomcat: you should do a couple of thread dumps before
restarting it. That way you can find out, if lots of requests got stuck
inside the container, and if so, what they are actually doing or waiting
for.
Concerning IIS: does "netstat -an" look fine, once you think you need to
restart?
But here's the kicker: - previously this year we were still using a
JK2 isapi_redirector2.dll, and that seemed to be serving comparable
traffic rates with fewer errors (certainly no complete failures). No
hard data to support this yet, just my recollection of serious
outages over the past couple of years.
I think, we should make a distinction between the number of log messages
(here we simply might be more detailed with JK) and serious problems,
like the container no longer responding, or responding to slowly.
AWStats on our log files suggests our incident traffic is ~7 million
pages per month, peaking at lunchtime & early evening at perhaps 3-5
reqs/sec.
That's not a lot of traffic. What are average response times? Is it
usual webapp load, or very special use cases, like long running uploads
or downloads?
Scaling to multiple tomcats is not an option right now due to 3rd
party license costs in the webapp (its a CMS system).
The request numbers seem not to support scaling horicontally like an
option, that you should consoder already (except request handling is
very CPU intensive, or you need a lot of memory, or ...).
Our environment: Java 1.3.1 Tomcat 4.1.18 IIS v5 IIS & Tomcat are
co-located on same server (4GB RAM, win2k o/s)
Ooops. I'm not really sure about the behaviour of 1.3 fopr thread dumps.
It's fine for 1.4.2, but you should test in a stagi8ng or dev system,
what happens with 1.3. Consider updateing to 4.1.36 and if possible
1.4.2_some_recent_patch_level.
Questions:
- Are there obvious worker directives that would help the issue
further ? - In the list archives I've seen conflicting views on what
to set connectionTimeout to be in the tomcat and worker config. Some
say 0, some say 600 secs. Which tends to be more useful? - All of the
00002745 errors - do they indicate a network problem upstream of the
server? - When viewing the jkstatus page, the worker only shows type,
host, address. I was expecting further data as listed in the legend.
Am I missing something?
2746: see above. I would not expect any worker setting to help in case
the root cause are really long running requests. Then you would really
have to log request duration and do a couple of thread dumps, to find
out, which requests are running to long for which reason.
jkstatus: add a load balancer worker to your ajp13 worker and use the
load balancer as the worker you map. The load balancer does a lot of
statistics and shows all the detailed information in jkstatus. Because
of its managability a load balancer is interesting, even if you have
only one backend.
isapi log:
[Fri Dec 07 03:35:16 2007] [error] jk_isapi_plugin.c (639):
WriteClient failed with 00002745 [Fri Dec 07 03:35:16 2007] [info]
jk_ajp_common.c (1384): Connection aborted or network problems [Fri
Dec 07 03:35:16 2007] [info] jk_ajp_common.c (1731): Receiving from
tomcat failed, because of client error without recovery in send loop
0 [Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639):
WriteClient failed with 00002746 [Fri Dec 07 03:35:17 2007] [error]
jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
03:35:17 2007] [info] jk_ajp_common.c (1384): Connection aborted or
network problems [Fri Dec 07 03:35:17 2007] [info] jk_ajp_common.c
(1384): Connection aborted or network problems [Fri Dec 07 03:35:17
2007] [info] jk_ajp_common.c (1731): Receiving from tomcat failed,
because of client error without recovery in send loop 0 [Fri Dec 07
03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed
with 00002746 [Fri Dec 07 03:35:17 2007] [info] jk_ajp_common.c
(1731): Receiving from tomcat failed, because of client error without
recovery in send loop 0 [Fri Dec 07 03:35:17 2007] [error]
jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
03:35:17 2007] [info] jk_ajp_common.c (1384): Connection aborted or
network problems [Fri Dec 07 03:35:17 2007] [info] jk_ajp_common.c
(1384): Connection aborted or network problems [Fri Dec 07 03:35:17
2007] [info] jk_ajp_common.c (1731): Receiving from tomcat failed,
because of client error without recovery in send loop 0 [Fri Dec 07
03:35:17 2007] [info] jk_ajp_common.c (1731): Receiving from tomcat
failed, because of client error without recovery in send loop 0 [Fri
Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002746 [Fri Dec 07 03:35:17 2007] [error]
jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
03:35:17 2007] [info] jk_ajp_common.c (1384): Connection aborted or
network problems
Out Tomcat connector config -
<Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
redirectPort="8443" bufferSize="2048" port="8009"
connectionTimeout="300000" scheme="http" enableLookups="false"
secure="false"
protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler"
debug="0" disableUploadTimeout="false" proxyPort="0"
maxProcessors="200" minProcessors="2" tcpNoDelay="true"
acceptCount="20" useURIValidationHack="false"> <Factory
className="org.apache.catalina.net.DefaultServerSocketFactory"/>
</Connector>
worker.properties -
worker.website.type=ajp13 worker.website.host=localhost
worker.website.port=8009 # 200 concurrent users
worker.website.connection_pool_size=200
worker.website.connection_pool_timeout=300
Regards,
Rainer
---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]