re: ATS 9.1.2

parent policy = consistent_hash
strategy(nexthop) policy = consistent_hash
num-parent-rings = 2(primary/secondary)
num-nexthop-rings = 2(primary/secondary)
retry-window = 300s
failure-threshold = 10s
parent-connection-timeout = 2s

I notice that the nexthop failure count upon a network timeout/event
never increments beyond 2.
With a failure threshold of 10, requests that land on this 'nexthop'
will always have to incur
a 2s timeout before moving onto the nexthop in the list. as the
threshold is never reached per nexthop's failure tracking.

failure counts using parent_select works as expected, Thus the failing
parent is taken out of rotation per the parent retry timer/window(300)

On it's face this looks like a bug.
If so,  I'll submit a github issue.
If not, and if I'm missing something within the nexthop/strategies
configuration(or other suggestions), then please enlighten. :-)



next_hop

[Apr 23 01:49:27.883] [ET_NET 10] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [75] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:30.308] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [76] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:32.366] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [76] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:34.481] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [77] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:37.474] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [77] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:39.596] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [78] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:41.599] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [78] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:44.439] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [79] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:47.434] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [79] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:49.633] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [80] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:52.639] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [80] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:55.474] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [81] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:58.468] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [81] Parent fail count increased to 2 for
192.168.72.208

parent_select

[Apr 23 02:11:12.507] [ET_NET 15] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 2 for 192.168.72.208:80
[Apr 23 02:11:14.595] [ET_NET 16] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 3 for 192.168.72.208:80
[Apr 23 02:11:17.589] [ET_NET 17] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 4 for 192.168.72.208:80
[Apr 23 02:11:20.435] [ET_NET 18] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 5 for 192.168.72.208:80
[Apr 23 02:11:22.602] [ET_NET 19] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 6 for 192.168.72.208:80
[Apr 23 02:11:25.587] [ET_NET 0] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 7 for
192.168.72.208:80
[Apr 23 02:11:28.353] [ET_NET 1] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 8 for
192.168.72.208:80
[Apr 23 02:11:30.795] [ET_NET 2] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 9 for
192.168.72.208:80
[Apr 23 02:11:33.758] [ET_NET 3] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 10
for 192.168.72.208:80

Reply via email to