March 2, 2026 at 16:25, "Ido Schimmel" <[email protected] mailto:[email protected]?to=%22Ido%20Schimmel%22%20%3Cidosch%40nvidia.com%3E > wrote:
> > On Mon, Mar 02, 2026 at 01:11:28PM +0800, Jiayuan Chen wrote: > > > > > From: Jiayuan Chen <[email protected]> > > > > When a standalone IPv6 nexthop object is created with a loopback device > > (e.g., "ip -6 nexthop add id 100 dev lo"), fib6_nh_init() misclassifies > > it as a reject route. This is because nexthop objects have no destination > > prefix (fc_dst=::), causing fib6_is_reject() to match any loopback > > nexthop. The reject path skips fib_nh_common_init(), leaving > > nhc_pcpu_rth_output unallocated. If an IPv4 route later references this > > nexthop, __mkroute_output() dereferences NULL nhc_pcpu_rth_output and > > panics. > > > > The reject classification was designed for regular IPv6 routes to prevent > > kernel loopback loops, but nexthop objects should not be subject to this > > check since they carry no destination information - loop prevention is > > handled separately when the route is created. > > > > An alternative approach of unconditionally calling fib_nh_common_init() > > for all reject routes was considered, but on large machines (e.g., 256 > > CPUs) with many routes, this wastes significant memory since > > nhc_pcpu_rth_output allocates a per-CPU pointer for each route. > > > > Since fib6_nh_init() is shared by multiple callers (route creation, > > nexthop object creation, IPv4 gateway validation), using fc_dst_len to > > implicitly distinguish nexthop objects would be fragile. Add an explicit > > fc_is_nh flag to fib6_config to clearly identify nexthop object creation > > and skip the reject check for this path. > > > > Fixes: 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh") > > Reported-by: [email protected] > > Closes: > > https://lore.kernel.org/all/[email protected]/T/ > > Signed-off-by: Jiayuan Chen <[email protected]> > > --- > > include/net/ip6_fib.h | 1 + > > net/ipv4/nexthop.c | 1 + > > net/ipv6/route.c | 8 +++++++- > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h > > index 88b0dd4d8e09..7710f247b8d9 100644 > > --- a/include/net/ip6_fib.h > > +++ b/include/net/ip6_fib.h > > @@ -62,6 +62,7 @@ struct fib6_config { > > struct nlattr *fc_encap; > > u16 fc_encap_type; > > bool fc_is_fdb; > > + bool fc_is_nh; > > }; > > > > struct fib6_node { > > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c > > index 7b9d70f9b31c..efad2dd27636 100644 > > --- a/net/ipv4/nexthop.c > > +++ b/net/ipv4/nexthop.c > > @@ -2859,6 +2859,7 @@ static int nh_create_ipv6(struct net *net, struct > > nexthop *nh, > > struct fib6_config fib6_cfg = { > > .fc_table = l3mdev_fib_table(cfg->dev), > > .fc_ifindex = cfg->nh_ifindex, > > + .fc_is_nh = true, > > .fc_gateway = cfg->gw.ipv6, > > .fc_flags = cfg->nh_flags, > > .fc_nlinfo = cfg->nlinfo, > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index c0350d97307e..347f464ce7fe 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -3628,7 +3628,13 @@ int fib6_nh_init(struct net *net, struct fib6_nh > > *fib6_nh, > > * they would result in kernel looping; promote them to reject routes > > */ > > addr_type = ipv6_addr_type(&cfg->fc_dst); > > - if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) { > > + /* > > + * Nexthop objects have no destination prefix, so fib6_is_reject() > > + * will misclassify loopback nexthops as reject routes, causing > > + * fib_nh_common_init() to be skipped along with its allocation > > + * of nhc_pcpu_rth_output, which IPv4 routes require. > > + */ > > + if (!cfg->fc_is_nh && fib6_is_reject(cfg->fc_flags, dev, addr_type)) { > > /* hold loopback dev/idev if we haven't done so. */ > > if (dev != net->loopback_dev) { > > if (dev) { > > > The code basically resets the nexthop device to the loopback device in > case of reject routes: > > # ip link add name dummy1 up type dummy > # ip route add unreachable 2001:db8:1::/64 dev dummy1 > # ip -6 route show 2001:db8:1::/64 > unreachable 2001:db8:1::/64 dev lo metric 1024 pref medium > > Therefore, the check in fib6_is_reject() regarding the nexthop device > being a loopback seems quite pointless. It's probably only needed when > promoting routes that are using the loopback device to reject routes, > which happens in ip6_route_info_create_nh() (the other caller of > fib6_is_reject()). > > I suggest simplifying the check so that it only applies to reject routes > [1]. It fixes the issue since RTF_REJECT is a route attribute and not a > nexthop attribute, so it will never be set by the nexthop code. > > [1] > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 85df25c36409..035e3f668d49 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -3582,7 +3582,6 @@ int fib6_nh_init(struct net *net, struct fib6_nh > *fib6_nh, > netdevice_tracker *dev_tracker = &fib6_nh->fib_nh_dev_tracker; > struct net_device *dev = NULL; > struct inet6_dev *idev = NULL; > - int addr_type; > int err; > > fib6_nh->fib_nh_family = AF_INET6; > @@ -3624,11 +3623,10 @@ int fib6_nh_init(struct net *net, struct fib6_nh > *fib6_nh, > > fib6_nh->fib_nh_weight = 1; > > - /* We cannot add true routes via loopback here, > - * they would result in kernel looping; promote them to reject routes > + /* Reset the nexthop device to the loopback device in case of reject > + * routes. > */ > - addr_type = ipv6_addr_type(&cfg->fc_dst); > - if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) { > + if (cfg->fc_flags & RTF_REJECT) { > /* hold loopback dev/idev if we haven't done so. */ > if (dev != net->loopback_dev) { > if (dev) { > Thanks, this is indeed the simplest fix. Let me walk through each case to confirm my understanding: Case 1: Explicit reject route (with RTF_REJECT) ip -6 route add unreachable 2001:db8:1::/64 cfg->fc_flags has RTF_REJECT before entering fib6_nh_init(), so the reject path is taken. fib_nh_common_init() is skipped, nhc_pcpu_rth_output is not allocated. This is fine since reject routes never need it. Case 2: Loopback implicit reject route (without RTF_REJECT) ip -6 route add 2001:db8::/32 dev lo cfg->fc_flags does not have RTF_REJECT, so fib6_nh_init() takes the normal path and fib_nh_common_init() allocates nhc_pcpu_rth_output. Later, ip6_route_info_create() calls fib6_is_reject() and marks the route as RTF_REJECT. The allocated nhc_pcpu_rth_output is unused but harmless. Case 3: Standalone nexthop object (our bug scenario) ip -6 nexthop add id 100 dev lo ip route add 172.20.20.0/24 nhid 100 cfg->fc_flags does not have RTF_REJECT (nexthop objects never carry route attributes), so fib6_nh_init() takes the normal path and fib_nh_common_init() allocates nhc_pcpu_rth_output. This fixes the crash when an IPv4 route later references this nexthop.

