Hi Dominik,
Dominik Pospisil wrote:
Hello,
I am having following problem with following failover test scenario.
Cluster setup:
- 1 apache load balancer
- 2 nodes with equal LB factor
- sticky session turned on
- Apache/2.0.52, mod_jk/1.2.26
Test scenario:
1. start 1st node
2. start load driver
3. start 2nd node
4. wait for state transfer (2 minutes)
5. kill 1st node
My experience is that after stage 1 and 2, all clients are handled correctly
by 1st node and the second node is set correctly to ERR state. After while,
the second none switches to ERR/REC state.
However at stage 4 (after starting 2nd node) the second node will never come
up to OK state. I have set both worker maintain period and LB recovery_time
to 30s. So i guess that in 2 minutes, the second node should have been
re-checked. When I press manually "Reset worker state" button, it comes up
immediatelly, but it never happend automatically during maintenance phase.
I would expect, that your load driver only send sticky requests, i.e.
requests with eiher cookie or URL encoding for node cluster01. At least
that would fit to your observation.
mod_jk detects during maintenance, if a worker was in error state long
enough to try again. This happens in your setup, as you can see by the
ERR/REC state. The next request that comes in *and does not contain a
session id of another node* will be routed to the REC node. Under load,
you won't see this state often, because most of the time it should turn
into ERR or OK very quick.
Maybe your app sets a cookie and the load driver always presrnts that
cookie. That way all further requests would be handled as sticky and
routed to the first node.
You can find out by logging %{Cookie}i in your httpd access log. If you
include this in your LogFormat, you can see the incoming Cookie header
for each request.
Eventually, after killing 1st node, and after returning couple of "503 Service
Temporarily Unavailable" exceptions, mod_jk finally recheck 2nd node status,
reroute requests to 2nd node and resumes correct operation.
My question is: Why the second node is not recognized before failover? Did I
missed something? Or is it a bug?
Thanks,
- Dominik
Attaching worker.properties
---------
worker.list=loadbalancer,status
worker.maintain=30
# modify the host as your host IP or DNS name.
worker.cluster01.port=8009
worker.cluster01.host=172.17.0.39
worker.cluster01.type=ajp13
worker.cluster01.lbfactor=1
#worker.cluster01.redirect=cluster02
# modify the host as your host IP or DNS name.
worker.cluster02.port=8009
worker.cluster02.host=172.17.1.39
worker.cluster02.type=ajp13
worker.cluster02.lbfactor=1
#worker.cluster02.redirect=cluster01
# modify the host as your host IP or DNS name.
# Load-balancing behaviour
worker.loadbalancer.type=lb
worker.loadbalancer.method=Session
worker.loadbalancer.balance_workers=cluster01,cluster02
worker.loadbalancer.sticky_session=1
worker.loadbalancer.recover_time=30
#worker.list=loadbalancer
# Status worker for managing load balancer
worker.status.type=status
Regards,
Rainer
---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]