Hi Rainer,

Thanks for your informative and thoughtful reply.

Yes, we are definitely going with CMS. It's a product environment so
we have to be careful with whatever we plan to do. By tweaking
reply_timeout, be it hard or soft, we were actually circumventing the
problem rather than facing it. Which is lame, I know. Anyway, thank
you again for the help and insight.

Regards,
Sean


On Wed, Apr 21, 2010 at 6:27 PM, Rainer Jung <rainer.j...@kippdata.de> wrote:
> Hi Sean,
>
> On 20.04.2010 08:04, Sean GAO wrote:
>>
>> According to online documentation
>> (http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html):
>> ----
>> Long Garbage Collection pauses on the backend do not make a good fit
>> with some timeouts. Try to optimise your Java memory and GC settings.
>> ----
>>
>> So if JVM tuning doesn't help, what else could I do? Balancer worker's
>> method=Busyness setting may have some effect, but still this is
>> different. What do you think?
>
> There is no kind of soft reply timeout which would mean a long response time
> indicates we shouldn't send more requests to that Tomcat but should still
> wait for the outstanding responses.
>
> The best you can do is tweaking the timeouts and the GC. Modern CMS GC
> doesn't do stop-the-world, most of it runs concurrently. Yes, after some
> time you might run into an occasional stop-the-world because of
> fragmentation, but they will be much rarer than without CMS.
>
> If your GC stop times are about 30 seconds, then that is not good, but I
> wouldn't reduce a reply_timeout to something much smaller anyhows. You don't
> want to make the error detection very sensible, because then it is not
> unlikely that you end up making your system more unstable than without. You
> wan to detect serious problems and react on them but you shouldn't want to
> react quickly on any indication of possible problems.
>
> What might help you a bit is the ability to define reply_timeout depending
> on the URL of the request. So if you know there are e.g. some reporting URLs
> that you know will take longet than a minute, you could set a general
> reply_timeout to e.g. 30 seconds, and the timeout for the report URLs to
> e.g. 2 minutes.
>
> If you use reply_timeout, never forget to also add a max_reply_timeouts.
>
> Concerning your configuration below, please do als consult
>
> http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html
>
> Regards,
>
> Rainer
>
>> On Tue, Apr 20, 2010 at 12:48 PM, Sean GAO<gaoyuxi...@gmail.com>  wrote:
>>>
>>> Hi,
>>>
>>> We are running apache 2.2.4 and tomcat 5.5.28 with mod_jk 1.2.28. 3
>>> tomcat instances.
>>>
>>> Referring to
>>> http://tomcat.apache.org/connectors-doc/reference/workers.html
>>> , we came up with a workers.properties file like this:
>>>
>>> worker.list=balancer
>>> worker.maintain=30
>>> #tomcat01
>>> worker.tomcat01.port=18009
>>> worker.tomcat01.host=localhost
>>> worker.tomcat01.type=ajp13
>>> worker.tomcat01.lbfactor=120
>>> worker.tomcat01.retries=2
>>> worker.tomcat01.socket_timeout=30
>>> worker.tomcat01.reply_timeout=30000
>>> worker.tomcat01.recover_time=300
>>> #tomcat02
>>> worker.tomcat02.port=28009
>>> worker.tomcat02.host=localhost
>>> worker.tomcat02.type=ajp13
>>> worker.tomcat02.lbfactor=100
>>> worker.tomcat02.retries=2
>>> worker.tomcat02.socket_timeout=30
>>> worker.tomcat02.reply_timeout=30000
>>> worker.tomcat02.recover_time=300
>>> #tomcat03
>>> worker.tomcat03.port=38009
>>> worker.tomcat03.host=localhost
>>> worker.tomcat03.type=ajp13
>>> worker.tomcat03.lbfactor=0
>>> worker.tomcat03.retries=2
>>> #loadbalancer
>>> worker.retries=2
>>> worker.balancer.type=lb
>>> worker.balancer.sticky_session=False
>>> worker.balancer.method=Busyness
>>> worker.balancer.balance_workers=tomcat01,tomcat02,tomcat03
>>>
>>> So basically tomcat01 and tomcat02 are the main request handlers, with
>>> tomcat03 acting as a backup server which is accessed only when both
>>> tomcat01 and tomcat02 are in error state (30 seconds without response,
>>> not necessarily mean offline). If something bad happens, e.g.
>>> excessively long GC, or redeployment, we assume each failed tomcat
>>> instance to get back to business in about 5 minutes.
>>>
>>> This meets our needs to a certain degree. However, there's one thing
>>> that bugs me:
>>> If we set the reply_timeout too high, we miss the whole point of
>>> fail-over. If we set the value too low, it's likely we are going to
>>> kill a lot of legitimate/would-otherwise-success request, which is not
>>> what we wanted either.
>>>
>>> Instead of breaking the long request (say,>30 seconds) and put the
>>> worker into "error" state, is there anyway, anyway at all, we can tell
>>> mod_jk to mark a worker "busy", so that future requests are routed to
>>> alternative workers? mok_jk can still check every 30 (or the default
>>> 60) seconds whether it is able to resume one of the "busy"-marked
>>> workers, just like it does with the ones in "error" state.
>>>
>>>
>>> Regards,
>>> Sean
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
>> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to