re: ATS 9.1.2 parent policy = consistent_hash strategy(nexthop) policy = consistent_hash num-parent-rings = 2(primary/secondary) num-nexthop-rings = 2(primary/secondary) retry-window = 300s failure-threshold = 10s parent-connection-timeout = 2s
I notice that the nexthop failure count upon a network timeout/event never increments beyond 2. With a failure threshold of 10, requests that land on this 'nexthop' will always have to incur a 2s timeout before moving onto the nexthop in the list. as the threshold is never reached per nexthop's failure tracking. failure counts using parent_select works as expected, Thus the failing parent is taken out of rotation per the parent retry timer/window(300) On it's face this looks like a bug. If so, I'll submit a github issue. If not, and if I'm missing something within the nexthop/strategies configuration(or other suggestions), then please enlighten. :-) next_hop [Apr 23 01:49:27.883] [ET_NET 10] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [75] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:30.308] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [76] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:32.366] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [76] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:34.481] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [77] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:37.474] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [77] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:39.596] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [78] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:41.599] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [78] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:44.439] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [79] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:47.434] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [79] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:49.633] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [80] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:52.639] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [80] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:55.474] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [81] Parent fail count increased to 2 for 192.168.72.208 [Apr 23 01:49:58.468] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139 (markNextHop)> (next_hop) [81] Parent fail count increased to 2 for 192.168.72.208 parent_select [Apr 23 02:11:12.507] [ET_NET 15] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 2 for 192.168.72.208:80 [Apr 23 02:11:14.595] [ET_NET 16] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 3 for 192.168.72.208:80 [Apr 23 02:11:17.589] [ET_NET 17] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 4 for 192.168.72.208:80 [Apr 23 02:11:20.435] [ET_NET 18] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 5 for 192.168.72.208:80 [Apr 23 02:11:22.602] [ET_NET 19] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 6 for 192.168.72.208:80 [Apr 23 02:11:25.587] [ET_NET 0] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 7 for 192.168.72.208:80 [Apr 23 02:11:28.353] [ET_NET 1] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 8 for 192.168.72.208:80 [Apr 23 02:11:30.795] [ET_NET 2] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 9 for 192.168.72.208:80 [Apr 23 02:11:33.758] [ET_NET 3] DEBUG: <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) Parent fail count increased to 10 for 192.168.72.208:80
