I hope that the below "snapshot" of my jkstatus shows up okay. This
is from my current setup using mod_jk 1.2.13 and using the
method=Traffic setting.
What's not obvious from this static snapshot is that the middle
webserver (webl6) is currently getting all requests. This is in
spite of the fact that the 4th server shows 28 busy connections...
don't believe it, it's not getting any connections. The only
webserver getting requests of the 5 is the middle one.
I don't know how that happened but it appears that the Rd (bytes
read) got reset so in order to "balance" things out it is now sending
everything to the one server. At first I thought maybe it was
because transferred, readed (sic) and mytraffic are size_t and maybe
one of them rolled over. But that would rollover at 4MB right?
Since I'm not fluent in this code, maybe someone who is could comment.
From my cursory look, it appears there might be a couple of issues
here:
1. Relying on total bytes (requests) can lead to situations where all
requests go to a single worker if one of counters gets messed up.
Perhaps it would be more reliable to keep a moving average instead
which might only temporarily disrupt normal operations.
2. Based on the Busy counts being incorrect, there doesn't appear to
be any semaphore locking of the shared memory. Could that be why
the Rd value got reset?
~Tom
Worker Status for loadbalance
Type
Sticky session
Force Sticky session
Retries
Method
Lock
lb
True
False
3
Traffic
Optimistic
Name
Type
Host
Addr
Stat
F
V
Acc
Err
Wr
Rd
Busy
Max
RR
Cd
webl7
ajp13
webl7:8009
172.18.7.100:8009
OK
100
100
495464
242
239M
2.4G
1
41
webl4
ajp13
webl4:8009
172.18.4.100:8009
OK
130
130
648407
368
312M
3.1G
1
50
webl6
ajp13
webl6:8009
172.18.6.100:8009
OK
169
169
1056555
305
507M
1.1G
7
56
webl8
ajp13
webl8:8009
172.18.8.100:8009
OK
220
220
4163167
1662
2.0G
3.8G
28
109
webl5
ajp13
webl5:8009
172.18.5.100:8009
OK
25
25
124571
53
60M
606M
1
19