Re: mod_jk maintenance, recovery

Rainer Jung Fri, 11 Jan 2008 05:22:56 -0800

Hi Dominik,

Dominik Pospisil wrote:

Hello,
I am having following problem with following failover test scenario.
Cluster setup:
- 1 apache load balancer
- 2 nodes with equal LB factor
- sticky session turned on
- Apache/2.0.52, mod_jk/1.2.26

Test scenario:
1. start 1st node
2. start load driver
3. start 2nd node
4. wait for state transfer (2 minutes)
5. kill 1st node
My experience is that after stage 1 and 2, all clients are handled correctlyby 1st node and the second node is set correctly to ERR state. After while,the second none switches to ERR/REC state.
However at stage 4 (after starting 2nd node) the second node will never comeup to OK state. I have set both worker maintain period and LB recovery_timeto 30s. So i guess that in 2 minutes, the second node should have beenre-checked. When I press manually "Reset worker state" button, it comes upimmediatelly, but it never happend automatically during maintenance phase.

I would expect, that your load driver only send sticky requests, i.e.requests with eiher cookie or URL encoding for node cluster01. At leastthat would fit to your observation.

mod_jk detects during maintenance, if a worker was in error state longenough to try again. This happens in your setup, as you can see by theERR/REC state. The next request that comes in *and does not contain asession id of another node* will be routed to the REC node. Under load,you won't see this state often, because most of the time it should turninto ERR or OK very quick.

Maybe your app sets a cookie and the load driver always presrnts thatcookie. That way all further requests would be handled as sticky androuted to the first node.

You can find out by logging %{Cookie}i in your httpd access log. If youinclude this in your LogFormat, you can see the incoming Cookie headerfor each request.

Eventually, after killing 1st node, and after returning couple of "503 ServiceTemporarily Unavailable" exceptions, mod_jk finally recheck 2nd node status,reroute requests to 2nd node and resumes correct operation.

My question is: Why the second node is not recognized before failover? Did Imissed something? Or is it a bug?

Thanks,

- Dominik

Attaching worker.properties
---------
worker.list=loadbalancer,status
worker.maintain=30

# modify the host as your host IP or DNS name.
worker.cluster01.port=8009
worker.cluster01.host=172.17.0.39
worker.cluster01.type=ajp13
worker.cluster01.lbfactor=1
#worker.cluster01.redirect=cluster02
# modify the host as your host IP or DNS name.
worker.cluster02.port=8009
worker.cluster02.host=172.17.1.39
worker.cluster02.type=ajp13
worker.cluster02.lbfactor=1
#worker.cluster02.redirect=cluster01
# modify the host as your host IP or DNS name.

# Load-balancing behaviour
worker.loadbalancer.type=lb
worker.loadbalancer.method=Session
worker.loadbalancer.balance_workers=cluster01,cluster02
worker.loadbalancer.sticky_session=1
worker.loadbalancer.recover_time=30

#worker.list=loadbalancer
# Status worker for managing load balancer
worker.status.type=status


Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: mod_jk maintenance, recovery

Reply via email to