On 9/27/20 11:48 PM, Baptiste Jonglez wrote: > On 27-09-20, David Ahern wrote: >> On 9/27/20 9:10 AM, Baptiste Jonglez wrote: >>> On 27-09-20, Baptiste Jonglez wrote: >>>> 1) failing IPv6 neighbours, what Alarig reported. We are seeing this >>>> on a full-view BGP router with rather low amount of IPv6 traffic >>>> (around 10-20 Mbps) >>> >>> Ok, I found a quick way to reproduce this issue: >>> >>> # for net in {1..9999}; do ip -6 route add 2001:db8:ffff:${net}::/64 >>> via fe80::4242 dev lo; done >>> >>> and then: >>> >>> # for net in {1..9999}; do ping -c1 2001:db8:ffff:${net}::1; done >>> >>> This quickly gets to a situation where ping fails early with: >>> >>> ping: connect: Network is unreachable >>> >>> At this point, IPv6 connectivity is broken. The kernel is no longer >>> replying to IPv6 neighbor solicitation from other hosts on local >>> networks. >>> >>> When this happens, the "fib_rt_alloc" field from /proc/net/rt6_stats >>> is roughly equal to net.ipv6.route.max_size (a bit more in my tests). >>> >>> Interestingly, the system appears to stay in this broken state >>> indefinitely, even without trying to send new IPv6 traffic. The >>> fib_rt_alloc statistics does not decrease. >>> >> >> fib_rt_alloc is incremented by calls to ip6_dst_alloc. Each of your >> 9,999 pings is to a unique address and hence causes a dst to be >> allocated and the counter to be incremented. It is never decremented. >> That is standard operating procedure. > > Ok, then this is a change in behaviour. Here is a graph of fib_rt_alloc > on a busy router (IPv6 full view, moderate IPv6 traffic) with 4.9 kernel: > > https://files.polyno.me/tmp/rt6_stats_fib_rt_alloc_4.9.png > > It varies quite a lot and stays around 50, so clearly it can be > decremented in regular operation. > > On 4.19 and later, it does seem to be decremented only when a route is > removed (ip -6 route delete). Here is the same graph on a router with a > 4.19 kernel and a large net.ipv6.route.max_size: > > https://files.polyno.me/tmp/rt6_stats_fib_rt_alloc_4.19.png > > Overall, do you mean that fib_rt_alloc is a red herring and is not a good > marker of the issue? >
$ git checkout v4.9 $ egrep -r fib_rt_alloc include/ net/ include//net/ip6_fib.h: __u32 fib_rt_alloc; /* permanent routes */ net//ipv6/route.c: net->ipv6.rt6_stats->fib_rt_alloc, The first declares it; the second prints it. That's it, no other users so I have no idea why it shows any changes in your v4.9 graph. Looking git history shows that Wei actually wired up the stats with commit 81eb8447daae3b62247aa66bb17b82f8fef68249 Author: Wei Wang <wei...@google.com> Date: Fri Oct 6 12:06:11 2017 -0700 ipv6: take care of rt6_stats That patch adds an inc but no dec for this stat which is what you show in your 4.19 graph Coming back to the bigger problem, fib_rt_alloc has *no* bearing on the ability to create dst entries which is what the max_route_size sysctl affects (not FIB entries which are now unbounded, but dst_entry instances which is when a FIB entry has been hit and used in the datapath to move packets). Eric investigated a similar problem recently which resulted in commit d8882935fcae28bceb5f6f56f09cded8d36d85e6 Author: Eric Dumazet <eduma...@google.com> Date: Fri May 8 07:34:14 2020 -0700 ipv6: use DST_NOCOUNT in ip6_rt_pcpu_alloc() and I believe is released in v5.8.