Hi Rainer, Thanks for the response. To cover a few points you made - - Yes, I had a hunch long running requests are a problem; because of our appliction design, some pages invoked for the first time take a while (we can't cache them all!). Is there an easy way to correlate (apart from timestamp) the errors in the isapi and the requests made to IIS ? I mean can I get isapi log to show the URL being processed? - I've now got the %D option in place on Tomcat to figure out from tomorrow which are the heavy pages - Yes, thread dumps on JDK 1.3 & TC4.1.x are tricky - I'm looking at the Tomcat JavaWrapper approach as a way forward. The version of the 3rd party product we have in place is only supported on jdk1.3 (this is a pretty ancient set up!) - I agree that my incident traffic load is not huge, and should be supportable by the environment in place. - I'll try a load balancer worker to see if that tells me more info If possible I'll have some more information in a day or so from this... cheers Tim
________________________________ From: Rainer Jung [mailto:[EMAIL PROTECTED] Sent: Mon 10/12/2007 12:11 To: Tomcat Users List Subject: Re: ISAPI JK2 ran better than JK, how can that be? Hi Tim, [EMAIL PROTECTED] wrote: > OK, > > So our website keeps crashing over the past couple of weeks (usual > story on this list eh?) Not really (although a users list is always focused on problems and not on the working side of things ...) > We've been running JK isapi plugin v1.2.15 for a fair while, but the > isapi redirector log always contains huge numbers of errors being > thrown (see snippet below). We were getting a complete failure of IIS > to serve traffic, solved only by a restart of IIS and Tomcat. > > Very recently we moved up to v1.2.25 in the hope of improving > performance but it seems to have little effect- we're still getting > high numbers of 503 responses sent back (maybe 5+ per minute). We do > however now serve static resources from IIS to reduce the use of the > ISAPI calls where possible. The error number 2746 is hex for 10054, which is a connection reset by peer winsock error. peer in this case is your IIS client (browser etc.). Often this is caused by long running requests, where the users press the retry/again button. Then the browser immediately closes the connection and uses a new one for the same request. When you are sending back the response later, the closed connection gets detected and logged. You should configure logging of response durations to find out, if maybe you've got a problem with long running requests. You can do that using the JK request logging, or with the Tomcat access log (add format %D to your pattern, which is duration milliseconds). Usually this does *not* mean, that restarting IIS or Tomcat will help. Concerning Tomcat: you should do a couple of thread dumps before restarting it. That way you can find out, if lots of requests got stuck inside the container, and if so, what they are actually doing or waiting for. Concerning IIS: does "netstat -an" look fine, once you think you need to restart? > But here's the kicker: - previously this year we were still using a > JK2 isapi_redirector2.dll, and that seemed to be serving comparable > traffic rates with fewer errors (certainly no complete failures). No > hard data to support this yet, just my recollection of serious > outages over the past couple of years. I think, we should make a distinction between the number of log messages (here we simply might be more detailed with JK) and serious problems, like the container no longer responding, or responding to slowly. > AWStats on our log files suggests our incident traffic is ~7 million > pages per month, peaking at lunchtime & early evening at perhaps 3-5 > reqs/sec. That's not a lot of traffic. What are average response times? Is it usual webapp load, or very special use cases, like long running uploads or downloads? > Scaling to multiple tomcats is not an option right now due to 3rd > party license costs in the webapp (its a CMS system). The request numbers seem not to support scaling horicontally like an option, that you should consoder already (except request handling is very CPU intensive, or you need a lot of memory, or ...). > Our environment: Java 1.3.1 Tomcat 4.1.18 IIS v5 IIS & Tomcat are > co-located on same server (4GB RAM, win2k o/s) Ooops. I'm not really sure about the behaviour of 1.3 fopr thread dumps. It's fine for 1.4.2, but you should test in a stagi8ng or dev system, what happens with 1.3. Consider updateing to 4.1.36 and if possible 1.4.2_some_recent_patch_level. > Questions: > > - Are there obvious worker directives that would help the issue > further ? - In the list archives I've seen conflicting views on what > to set connectionTimeout to be in the tomcat and worker config. Some > say 0, some say 600 secs. Which tends to be more useful? - All of the > 00002745 errors - do they indicate a network problem upstream of the > server? - When viewing the jkstatus page, the worker only shows type, > host, address. I was expecting further data as listed in the legend. > Am I missing something? 2746: see above. I would not expect any worker setting to help in case the root cause are really long running requests. Then you would really have to log request duration and do a couple of thread dumps, to find out, which requests are running to long for which reason. jkstatus: add a load balancer worker to your ajp13 worker and use the load balancer as the worker you map. The load balancer does a lot of statistics and shows all the detailed information in jkstatus. Because of its managability a load balancer is interesting, even if you have only one backend. > isapi log: > > [Fri Dec 07 03:35:16 2007] [error] jk_isapi_plugin.c (639): > WriteClient failed with 00002745 [Fri Dec 07 03:35:16 2007] [info] > jk_ajp_common.c (1384): Connection aborted or network problems [Fri > Dec 07 03:35:16 2007] [info] jk_ajp_common.c (1731): Receiving from > tomcat failed, because of client error without recovery in send loop > 0 [Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): > WriteClient failed with 00002746 [Fri Dec 07 03:35:17 2007] [error] > jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07 > 03:35:17 2007] [info] jk_ajp_common.c (1384): Connection aborted or > network problems [Fri Dec 07 03:35:17 2007] [info] jk_ajp_common.c > (1384): Connection aborted or network problems [Fri Dec 07 03:35:17 > 2007] [info] jk_ajp_common.c (1731): Receiving from tomcat failed, > because of client error without recovery in send loop 0 [Fri Dec 07 > 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed > with 00002746 [Fri Dec 07 03:35:17 2007] [info] jk_ajp_common.c > (1731): Receiving from tomcat failed, because of client error without > recovery in send loop 0 [Fri Dec 07 03:35:17 2007] [error] > jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07 > 03:35:17 2007] [info] jk_ajp_common.c (1384): Connection aborted or > network problems [Fri Dec 07 03:35:17 2007] [info] jk_ajp_common.c > (1384): Connection aborted or network problems [Fri Dec 07 03:35:17 > 2007] [info] jk_ajp_common.c (1731): Receiving from tomcat failed, > because of client error without recovery in send loop 0 [Fri Dec 07 > 03:35:17 2007] [info] jk_ajp_common.c (1731): Receiving from tomcat > failed, because of client error without recovery in send loop 0 [Fri > Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient > failed with 00002746 [Fri Dec 07 03:35:17 2007] [error] > jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07 > 03:35:17 2007] [info] jk_ajp_common.c (1384): Connection aborted or > network problems > > Out Tomcat connector config - > > <Connector className="org.apache.coyote.tomcat4.CoyoteConnector" > redirectPort="8443" bufferSize="2048" port="8009" > connectionTimeout="300000" scheme="http" enableLookups="false" > secure="false" > protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler" > debug="0" disableUploadTimeout="false" proxyPort="0" > maxProcessors="200" minProcessors="2" tcpNoDelay="true" > acceptCount="20" useURIValidationHack="false"> <Factory > className="org.apache.catalina.net.DefaultServerSocketFactory"/> > </Connector> > > worker.properties - > > worker.website.type=ajp13 worker.website.host=localhost > worker.website.port=8009 # 200 concurrent users > worker.website.connection_pool_size=200 > worker.website.connection_pool_timeout=300 Regards, Rainer --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]