Thanks, I did try to unsubscribe but I kept getting them. Will try the address below.
Luke Walshe BT Operate, HGIPCC Technical Specialist Telephone: +44 (0)1314483482, Email: [EMAIL PROTECTED] -----Original Message----- From: Rainer Jung [mailto:[EMAIL PROTECTED] Sent: 21 February 2008 09:30 To: Tomcat Users List Subject: Re: mod_jk Problems - - worker went to error state and dont recover See the footer of any mail on the list: --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: > All > > Apologies, this is unrelated. How do I unsubscribe from this mailing > list, I thought it would be useful and small but its overwhelming my > inbox? > > Thanks in Advance. > > Luke Walshe > BT Operate, HGIPCC Technical Specialist > Telephone: +44 (0)1314483482, Email: [EMAIL PROTECTED] > > -----Original Message----- > From: Ahmed Musa [mailto:[EMAIL PROTECTED] > Sent: 21 February 2008 09:25 > To: Tomcat Users List > Subject: Re: mod_jk Problems - - worker went to error state and dont > recover > > Hello Rainer, > Thanks for your informations - the Situation gets more clear now. > I will read again some dics - following your links and will make further > tests also with the improved logging. > Thanks a lot for your time > with best regards > ahmed > > -------- Original-Nachricht -------- >> Datum: Wed, 20 Feb 2008 18:59:01 +0100 >> Von: Rainer Jung <[EMAIL PROTECTED]> >> An: Tomcat Users List <users@tomcat.apache.org> >> Betreff: Re: mod_jk Problems - - worker went to error state and dont > recover > >> Ahmed Musa wrote: >>> Hello, >>> Wow -thank you very much Rainer for your very quick and informative >> answer. >>> I will go to 1.2.26 and think about some "smoother" Values for >> reply_timeout and max_reply_timeouts. >>> I will search for the requests which causes the Problems - becasue i >> still log the response time in your mentioned way - but I am not sure > that the >> Userrequests are responsible for the Situation. >> >> One note: for Apache httpd 2.x %d is microseconds (there is no format >> for milliseconds), for Tomcat %D is milliseconds. As long as you are >> searching for the root cause, it might make sense to have both access >> logs active to check about duration differences. >> >>> So one further question - does mod_jk itself checks if the Backend > is >> reachable - without userrequests? >> >> No. Everything only works on top of user requests. >> >>> When there are connections to the Backend - are they closed after > the >> respone or are the hold open for further requests. >> >> In general hold open. There are parameters on how long they are held >> open without more requests before they get shut down, and also how > many >> might be kept open even when no requests are coming in. Those are the >> connection pool parameters, which you will find on >> >> http://tomcat.apache.org/connectors-doc/reference/workers.html >> >> Tomcat also has a connectionTimeout on the connector, which will shut >> down a connection from the Tomcat side if it is idle for to long. >> >> If you don't want to reuse connections at all, there's also a setting > (a >> JkOption in Apache). >> >>> Is it possible that the Checkpoint Firewall in Between can be >> responsible for the connectivity problem? >> >> It can cut a connection that's idle for too long. Since you have >> cping/cpong active via connect_timeout and prepost_timeout, you should > >> get a cping error message, if the connection was dropped by the > firewall >> during idle times and mod_jk tries to use it again. The reply timeout > in >> the error log indicates, that the backend isn't answering. Of course > if >> it takes *very* long to answer, it might be that the firewall dropped >> the connection in between, but then the root cause would still be the >> long response time of the backend. >> >>> Another point is the "not recovering" of the worker. Yes, you are > right >> - in this situation i have many reply_timeouts - but these happens in > a >> period of time - for example 30 minutes - but the worker is still dead > even >> then when there are no more reply_timeouts. It remains dead. >>> It was necessary to restart it manually via jkstatus. >> I assume you are using stickyness, so when a session started on a > node, >> it will stay there. So when a worker is in error for a long time, all >> new sessions will start on other nodes. If the worker is ready for >> recovery, it needs a request, that doesn't carry a session to get > probed >> with this request. >> >> In jkstatus, the status of an error worker should switch to REC, when >> mod_jk decides that it could send a non-sticky request there (to > probe) >> and to PRB, during the time this request is on the node, and finally >> either to OK or back to ERR depending on the result of the request. >> >> You can log the number of errors (and accesses) that happened on the >> node in the httpd access log. If you think that the node simply stays > in >> error for a long time, then the error count (and access count) should >> stay constant. I would expect, that they do not. >> >> Have a look at how LogFormat in Apache httpd works, and then add some > of >> those documented in >> >> http://tomcat.apache.org/connectors-doc/reference/apache.html >> >> like: >> >> JK_LB_LAST_NAME >> JK_LB_LAST_ACCESSED >> JK_LB_LAST_ERRORS >> JK_LB_LAST_BUSY >> JK_LB_LAST_STATE >> >> using the syntax %{JK_LB_LAST_STATE}n etc. >> >>> Another point is the learning - i read the dics - the infos on the >> apache Website i dont't find other ones - are there other ones ? - and > they are >> not going in depth - if you read the spec and watch the logs it is - > for me >> - very hard to match the things. Also the many possibilities that > mod_jk >> has to prove if there is a connection to the Backend,... - i > understand them >> but check the reality in an error situation is very hard. Under > matching i >> mean "Which Part of the Communication sequence failed - why - and > causes >> which error message". >>> But i will try - and study also the mailing list.. >> It's hard for us too (sometimes). >> >>> Thank you for your time - tomorrow we will have the new version and > will >> see what happens. >>> best >>> ahmed >> >> Regards, >> >> Rainer >> >>> -------- Original-Nachricht -------- >>>> Datum: Wed, 20 Feb 2008 15:56:42 +0100 >>>> Von: Rainer Jung <[EMAIL PROTECTED]> >>>> An: Tomcat Users List <users@tomcat.apache.org> >>>> Betreff: Re: mod_jk Problems - - worker went to error state and > dont >> recover >>>> [EMAIL PROTECTED] wrote: >>>>> See Thread at: http://www.techienuggets.com/Detail?tx=25608 Posted > on >>>> behalf of a User >>>>> Hallo to all, After long unsuccessful research i hope someone can >>>>> give me a hint to the following problems. >>>>> >>>>> Our Apache-mod_jk-Tomcat Infrastructur was running without > Problems >>>>> for about one year-than since two month mod_jk errors occurs. >>>>> We upgraded the mod_jk Version, made improvements in the >>>>> worker.properties - the problems changed and get less but > sometimes >> they >>>>> appear further on. >>>>> >>>>> It seems that the mod_jk worker loose the connection to their >>>>> Tomcat-Backendserver - there are messages in the mod_jk log Files >> which >>>>> points in this direction. Normally this seems not to be a big > problem >> - >>>>> but under certain conditions (which ?) the worker goes to an error >> state >>>>> and cannot recover itself- must be done manually. >>>>> >>>>> Problem 1: The Tomcats are reachable - unknown why the workers > think >> the >>>> server is dead ? >>>>> Problem 2: I have no idea why the worker goes to an error state > and >>>> cannot recover. >>>> >>>> 2 is a consequence of 1 >>>> >>>>> Problem3: I miss explanations of logged messages - i read the > messages >> - >>>> but cannot match them to the situation - when does a worker post > this >>>> messages >>>> >>>> 1 is a consequence of these messages >>>> >>>>> [Wed Feb 20 10:04:01.889 2008] [19237:3086010048] [info] >>>> jk_handler::mod_jk.c (2270): Aborting connection for worker=ajp_ggi > >>>>> [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] >>>> ajp_get_reply::jk_ajp_common.c (1623): (INETP1011) Timeout with > waiting >> reply from >>>> tomcat. Tomcat is down, stopped or network problems (errno=110) >>>>> [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] >>>> ajp_service::jk_ajp_common.c (2034): (INETP1011) receiving reply > from >> tomcat failed with >>>> out recovery in send loop attempt=0 >>>>> [Wed Feb 20 10:04:41.799 2008] [19294:3086010048] [error] >>>> service::jk_lb_worker.c (1105): unrecoverable error 504, request >> failed. Tomcat failed in >>>> the middle of request, we can't recover to another instance. >>>> >>>> The second line tells us, that your configured reply_timeout fired. >>>> You set it to 120000 (2 minutes), so there are requests taking > longer >>>> than 2 minutes on the backend, before the first response packet > comes >>>> back from the backend. >>>> >>>> With your configuration mod_jk then doesn't wait any longer on the >> reply >>>> *and puts the backend into error mode*. >>>> >>>> Up until version 1.2.25, if you use a reply-timeout, you need to > set it >>>> to a high number which justifies the resoning "if it takes that > long, >>>> that something is wrong with the backend". >>>> >>>> Reality shows: there is no such number. Often there are few > requests >>>> that take unaccetably long on the backend *although* the backend is > >>>> still working. >>>> >>>> So in 1.2.25 we added max_reply_timeouts. With this set in addition > to >>>> reply_timeout, mod_jk will abort waiting for a reply after >>>> reply_timeout, but allow some timeouts before actually deciding to > put >>>> the backend into error. >>>> >>>> Unfortunately the implementation of max_reply_timeouts in 1.2.25 > was >>>> wrong, so you need to go to 1.2.26 to get it working right. >>>> >>>> See: >>>> >>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=43229 >>>> >>>> Caution: this does *not* explain, why the backends are not >> automatically >>>> recovered after a minute of error condition. Maybe you have times, >> where >>>> you getr to many of those reply_timeouts (see log file), and > although >> we >>>> recover after a minute the backend almost immediately goes back > into >>>> error status. >>>> >>>>> -> Which Timeout - how does mod_jk think Tomcat is down ? Where > can i >>>> found details to errno=110 ?... >>>> >>>> reply_timeout, see above and also >>>> >>>> http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html >>>> >>>> errno: a standard unix feature. The numbers are platform dependent. > I >>>> would assume in your case >>>> >>>> ETIMEDOUT 110 /* Connection timed out */ >>>> >>>> so no wonder, that's exactly what we expect (and doesn't tell us > the >>>> reason, i.e. what's wrong on the *backend* taking that long for a >>>> response). >>>> >>>>> -> receiving reply from tomcat failed with out recovery in send > loop >>>> attempt=0 - ? with out recovery in send loop - means? >>>> >>>> That your configuration doesn't allow us to send the request to > another >>>> backend. recovery_options 7 include: if mod_jk was able to send the > >>>> request to a backend, do not try to send it to another backend in > case >>>> of an error during the response handling. Even if you would allow >>>> sending to another backend, it would not help with *not* putting > the >>>> worker into error state. More likely would be, that you would put > all >>>> workers into error state, because all of them might run into the > same >>>> timeout, one after the other. >>>> >>>>> -> unrecoverable error 504 - details to this error ? >>>> That's simply how we return the situation back to the client > (browser). >>>>> Ok - i turn the logging level to debug - the course of events get >>>>> more >>>>> clear - but also more questions appear - there are socket numbers > - >>>>> which sockets - what are these numbers e.g will be shutting down >> socket >>>>> 35 for worker INETP1021 - The sockets are good for ? - how many > are >>>>> there/per worker ? can i configure them ? >>>> Should not be the problem here. For apache httpd if you do *not* >>>> configure anything, we automatically choose the number of httpd > threads >>>> as the maximum number of connections. No need to change anything > here. >>>>> => Generally -How can i solve such problems - i tried to look into >>>>> the >>>>> mod_jk code - searching for error codes, error messages - but > cannot >>>>> find some relevant informations, - i am studying the log Files - > but >>>>> don't find out what really happens. >>>> Post to the list. Improve our dics. >>>> >>>> The error message contains the word "timeout" and "reply" and you > have >> a >>>> "reply_timeout". >>>> >>>> Long running requests are a frequent problem. If you want to get > rid of >>>> them, start by adding response times to your httpd and your tomcat >>>> access log format (%D). Then have a look, which URLs are producing > long >>>> running requests, during what time of day are they happening etc. > This >>>> might give you a clue about the reasons. >>>> >>>> And if they are very frequent: do Java Thread Dumps of your > backends >> and >>>> analyze them. >>>> >>>>> So - maybe someone has an idea why the worker think that the >>>>> corresponding Tomcat is dead, and why he will not recover by > itself. ! >>>> Tomecat is dead: from the point of view of mod_jk it simply means: > we >>>> didn't get an answer, when we expected one. Details depend on the >>>> additional log lines (could not connect, reply timeout etc.). >>>> >>>>> And i am also searching for tips how i can help myself - and where > to >>>>> find something about the error codes, messages,..in mod_jk >>>>> >>>>> thanks for your attention >>>>> Best >>>>> ahmed musa (writing from vienna) >>>>> >>>> Regards, >>>> >>>> Rainer >>>> >>>>> Current Infrastructur >>>>> We have 3 Apache Webserver (2.2.6) -based on CentOS release 4.3 >>>> /Kernelversion 2.6.9-34 >>>>> In front of the Webserver there are two (two Locations) >> HW-Loadbalancer >>>> (but they have no role in this story) >>>>> The Webservers are hosted at our ISP. >>>>> >>>>> The Webserver balance the requests via mod_jk (Version 1.2.25) for >>>>> approx. 10 Webapps to 18 Backend-Tomcatserver (Bladeserver - > because >> of >>>>> underlying Application-Parts the OS is Windows 2003 Server - a > long >>>>> story not worth to explain :-) ). The Tomcatserver gain Data via >>>>> Requests against DB2 Server/DB2-Databases on the Mainframe. The >>>>> Tomcatserver are Inhouse -and were rebooted nightly because of >> automated >>>>> Deployment processes. >>>>> >>>>> Between the Webserver and the Tomcatserver is a Checkpoint > Firewall. >>>>> All webapps are deployed on all Tomcats - only mod_jk manages the >>>>> requests to certain Tomcat- instances. >>>>> (on one Bladeserver there are two identically Tomcat Instances >>>>> running). >>>>> >>>>> Versions: Tomcat - 5.5.17_11, JDK 1.5.0_11-b03. The requests > against >>>>> the public Website(s) are normal short living requests - not many > - >> The >>>>> most Webapps (Portals) need a login, have a strong focus on > business >>>>> logic - so the instances are big (many MBs in RAM), the sessions > are >>>>> sticky and the session timeout is 20 minutes. But there are also > less >>>>> requests. To the User requests - Monitoring requests from our ISP > are >>>> added. >>>>> The Problems appears at Servers/Portals which very less > Userrequests. >>>>> worker.properties >>>>> worker.list=ajp_bam,ajp_ggi,ajp_ad,ajp_svp,.......,jkstatus >>>>> >>>>> worker.template.type=ajp13 >>>>> worker.template.lbfactor=5 >>>>> worker.template.socket_keepalive=1 >>>>> worker.template.connect_timeout=7000 >>>>> worker.template.prepost_timeout=5000 >>>>> worker.template.reply_timeout=120000 >>>>> worker.template.retries=6 >>>>> worker.template.activation=Active >>>>> worker.template.recovery_options=7 >>>>> >>>>> worker.lbtemplate.type=lb >>>>> worker.lbtemplate.max_reply_timeouts=6 >>>>> worker.lbtemplate.method=Session >>>>> >>>>> #Produktions Worker >>>>> # AS-INETP101 - 106 - 6/6 GGI >>>>> worker.INETP1011.host=AS-INETP101.AEAT.ALLIANZ.AT >>>>> worker.INETP1011.port=65001 >>>>> worker.INETP1011.reference=worker.template >>>>> >>>>> ....many more of the same >>>>> >>>>> then >>>>> >>>>> worker.ajp_ad.reference=worker.lbtemplate >>>>> worker.ajp_ad.balance_workers=INETP1032,INETP1062 >>>>> >>>>> .... many more portals >>>>> >>>>> at least jkstatus >>>>> >>>>> The JKMount is very simple >>>>> JkMount /* ajp_ad --- for the other portals mostly the same >>>>> >>>>> The Portals are Virtual Hosts on the Apache. >>>>> >>>>> Tomcat - server.xml >>>>> example >>>>> <Connector port="65001" maxThreads="300" protocol="AJP/1.3" /> >>>>> <Engine name="Catalina" jvmRoute="INETP5021" >> defaultHost="default"> >>>>> ...... >>>>> <Host name="slfinsol.com" appBase="webapps" unpackWARs="true" >>>>> autoDeploy="false" deployOnStartup="false" xmlValidation="false" >>>>> xmlNamespaceAware="false"> >>>>> <Alias>www.slfinsol.com</Alias> >>>>> <Alias>web1.slfinsol.com</Alias> >>>>> ... >>>>> <Alias>testweb.slfinsol.com</Alias> >>>>> ..... >>>>> <Valve > className="org.apache.catalina.valves.AccessLogValve" >>>>> directory="logs" prefix="swl_access_log." suffix=".txt" >> pattern="common" >>>>> resolveHosts="false" /> >>>>> <Valve >>>>> className="at.allianz.tomcat.valve.RequestTimeValve"/> >>>>> <Valve >>>>> > className="at.allianz.tomcat.valve.WebcollaborationWorkaroundValve"/> >>>>> <Context path="" docBase="swl" /> >>>>> <Context path="/monitor5" docBase="monitor" /> >>>>> <Context path="/swl" docBase="swl" /> >>>>> </Host> >> --------------------------------------------------------------------- >> To start a new topic, e-mail: users@tomcat.apache.org >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]