Hi Rainer, Thanks for your informative and thoughtful reply.
Yes, we are definitely going with CMS. It's a product environment so we have to be careful with whatever we plan to do. By tweaking reply_timeout, be it hard or soft, we were actually circumventing the problem rather than facing it. Which is lame, I know. Anyway, thank you again for the help and insight. Regards, Sean On Wed, Apr 21, 2010 at 6:27 PM, Rainer Jung <rainer.j...@kippdata.de> wrote: > Hi Sean, > > On 20.04.2010 08:04, Sean GAO wrote: >> >> According to online documentation >> (http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html): >> ---- >> Long Garbage Collection pauses on the backend do not make a good fit >> with some timeouts. Try to optimise your Java memory and GC settings. >> ---- >> >> So if JVM tuning doesn't help, what else could I do? Balancer worker's >> method=Busyness setting may have some effect, but still this is >> different. What do you think? > > There is no kind of soft reply timeout which would mean a long response time > indicates we shouldn't send more requests to that Tomcat but should still > wait for the outstanding responses. > > The best you can do is tweaking the timeouts and the GC. Modern CMS GC > doesn't do stop-the-world, most of it runs concurrently. Yes, after some > time you might run into an occasional stop-the-world because of > fragmentation, but they will be much rarer than without CMS. > > If your GC stop times are about 30 seconds, then that is not good, but I > wouldn't reduce a reply_timeout to something much smaller anyhows. You don't > want to make the error detection very sensible, because then it is not > unlikely that you end up making your system more unstable than without. You > wan to detect serious problems and react on them but you shouldn't want to > react quickly on any indication of possible problems. > > What might help you a bit is the ability to define reply_timeout depending > on the URL of the request. So if you know there are e.g. some reporting URLs > that you know will take longet than a minute, you could set a general > reply_timeout to e.g. 30 seconds, and the timeout for the report URLs to > e.g. 2 minutes. > > If you use reply_timeout, never forget to also add a max_reply_timeouts. > > Concerning your configuration below, please do als consult > > http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html > > Regards, > > Rainer > >> On Tue, Apr 20, 2010 at 12:48 PM, Sean GAO<gaoyuxi...@gmail.com> wrote: >>> >>> Hi, >>> >>> We are running apache 2.2.4 and tomcat 5.5.28 with mod_jk 1.2.28. 3 >>> tomcat instances. >>> >>> Referring to >>> http://tomcat.apache.org/connectors-doc/reference/workers.html >>> , we came up with a workers.properties file like this: >>> >>> worker.list=balancer >>> worker.maintain=30 >>> #tomcat01 >>> worker.tomcat01.port=18009 >>> worker.tomcat01.host=localhost >>> worker.tomcat01.type=ajp13 >>> worker.tomcat01.lbfactor=120 >>> worker.tomcat01.retries=2 >>> worker.tomcat01.socket_timeout=30 >>> worker.tomcat01.reply_timeout=30000 >>> worker.tomcat01.recover_time=300 >>> #tomcat02 >>> worker.tomcat02.port=28009 >>> worker.tomcat02.host=localhost >>> worker.tomcat02.type=ajp13 >>> worker.tomcat02.lbfactor=100 >>> worker.tomcat02.retries=2 >>> worker.tomcat02.socket_timeout=30 >>> worker.tomcat02.reply_timeout=30000 >>> worker.tomcat02.recover_time=300 >>> #tomcat03 >>> worker.tomcat03.port=38009 >>> worker.tomcat03.host=localhost >>> worker.tomcat03.type=ajp13 >>> worker.tomcat03.lbfactor=0 >>> worker.tomcat03.retries=2 >>> #loadbalancer >>> worker.retries=2 >>> worker.balancer.type=lb >>> worker.balancer.sticky_session=False >>> worker.balancer.method=Busyness >>> worker.balancer.balance_workers=tomcat01,tomcat02,tomcat03 >>> >>> So basically tomcat01 and tomcat02 are the main request handlers, with >>> tomcat03 acting as a backup server which is accessed only when both >>> tomcat01 and tomcat02 are in error state (30 seconds without response, >>> not necessarily mean offline). If something bad happens, e.g. >>> excessively long GC, or redeployment, we assume each failed tomcat >>> instance to get back to business in about 5 minutes. >>> >>> This meets our needs to a certain degree. However, there's one thing >>> that bugs me: >>> If we set the reply_timeout too high, we miss the whole point of >>> fail-over. If we set the value too low, it's likely we are going to >>> kill a lot of legitimate/would-otherwise-success request, which is not >>> what we wanted either. >>> >>> Instead of breaking the long request (say,>30 seconds) and put the >>> worker into "error" state, is there anyway, anyway at all, we can tell >>> mod_jk to mark a worker "busy", so that future requests are routed to >>> alternative workers? mok_jk can still check every 30 (or the default >>> 60) seconds whether it is able to resume one of the "busy"-marked >>> workers, just like it does with the ones in "error" state. >>> >>> >>> Regards, >>> Sean >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org