March 2, 2026 at 16:25, "Ido Schimmel" <[email protected] 
mailto:[email protected]?to=%22Ido%20Schimmel%22%20%3Cidosch%40nvidia.com%3E > 
wrote:


> 
> On Mon, Mar 02, 2026 at 01:11:28PM +0800, Jiayuan Chen wrote:
> 
> > 
> > From: Jiayuan Chen <[email protected]>
> >  
> >  When a standalone IPv6 nexthop object is created with a loopback device
> >  (e.g., "ip -6 nexthop add id 100 dev lo"), fib6_nh_init() misclassifies
> >  it as a reject route. This is because nexthop objects have no destination
> >  prefix (fc_dst=::), causing fib6_is_reject() to match any loopback
> >  nexthop. The reject path skips fib_nh_common_init(), leaving
> >  nhc_pcpu_rth_output unallocated. If an IPv4 route later references this
> >  nexthop, __mkroute_output() dereferences NULL nhc_pcpu_rth_output and
> >  panics.
> >  
> >  The reject classification was designed for regular IPv6 routes to prevent
> >  kernel loopback loops, but nexthop objects should not be subject to this
> >  check since they carry no destination information - loop prevention is
> >  handled separately when the route is created.
> >  
> >  An alternative approach of unconditionally calling fib_nh_common_init()
> >  for all reject routes was considered, but on large machines (e.g., 256
> >  CPUs) with many routes, this wastes significant memory since
> >  nhc_pcpu_rth_output allocates a per-CPU pointer for each route.
> >  
> >  Since fib6_nh_init() is shared by multiple callers (route creation,
> >  nexthop object creation, IPv4 gateway validation), using fc_dst_len to
> >  implicitly distinguish nexthop objects would be fragile. Add an explicit
> >  fc_is_nh flag to fib6_config to clearly identify nexthop object creation
> >  and skip the reject check for this path.
> >  
> >  Fixes: 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh")
> >  Reported-by: [email protected]
> >  Closes: 
> > https://lore.kernel.org/all/[email protected]/T/
> >  Signed-off-by: Jiayuan Chen <[email protected]>
> >  ---
> >  include/net/ip6_fib.h | 1 +
> >  net/ipv4/nexthop.c | 1 +
> >  net/ipv6/route.c | 8 +++++++-
> >  3 files changed, 9 insertions(+), 1 deletion(-)
> >  
> >  diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
> >  index 88b0dd4d8e09..7710f247b8d9 100644
> >  --- a/include/net/ip6_fib.h
> >  +++ b/include/net/ip6_fib.h
> >  @@ -62,6 +62,7 @@ struct fib6_config {
> >  struct nlattr *fc_encap;
> >  u16 fc_encap_type;
> >  bool fc_is_fdb;
> >  + bool fc_is_nh;
> >  };
> >  
> >  struct fib6_node {
> >  diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> >  index 7b9d70f9b31c..efad2dd27636 100644
> >  --- a/net/ipv4/nexthop.c
> >  +++ b/net/ipv4/nexthop.c
> >  @@ -2859,6 +2859,7 @@ static int nh_create_ipv6(struct net *net, struct 
> > nexthop *nh,
> >  struct fib6_config fib6_cfg = {
> >  .fc_table = l3mdev_fib_table(cfg->dev),
> >  .fc_ifindex = cfg->nh_ifindex,
> >  + .fc_is_nh = true,
> >  .fc_gateway = cfg->gw.ipv6,
> >  .fc_flags = cfg->nh_flags,
> >  .fc_nlinfo = cfg->nlinfo,
> >  diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> >  index c0350d97307e..347f464ce7fe 100644
> >  --- a/net/ipv6/route.c
> >  +++ b/net/ipv6/route.c
> >  @@ -3628,7 +3628,13 @@ int fib6_nh_init(struct net *net, struct fib6_nh 
> > *fib6_nh,
> >  * they would result in kernel looping; promote them to reject routes
> >  */
> >  addr_type = ipv6_addr_type(&cfg->fc_dst);
> >  - if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> >  + /*
> >  + * Nexthop objects have no destination prefix, so fib6_is_reject()
> >  + * will misclassify loopback nexthops as reject routes, causing
> >  + * fib_nh_common_init() to be skipped along with its allocation
> >  + * of nhc_pcpu_rth_output, which IPv4 routes require.
> >  + */
> >  + if (!cfg->fc_is_nh && fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> >  /* hold loopback dev/idev if we haven't done so. */
> >  if (dev != net->loopback_dev) {
> >  if (dev) {
> > 
> The code basically resets the nexthop device to the loopback device in
> case of reject routes:
> 
> # ip link add name dummy1 up type dummy
> # ip route add unreachable 2001:db8:1::/64 dev dummy1
> # ip -6 route show 2001:db8:1::/64
> unreachable 2001:db8:1::/64 dev lo metric 1024 pref medium
> 
> Therefore, the check in fib6_is_reject() regarding the nexthop device
> being a loopback seems quite pointless. It's probably only needed when
> promoting routes that are using the loopback device to reject routes,
> which happens in ip6_route_info_create_nh() (the other caller of
> fib6_is_reject()).
> 
> I suggest simplifying the check so that it only applies to reject routes
> [1]. It fixes the issue since RTF_REJECT is a route attribute and not a
> nexthop attribute, so it will never be set by the nexthop code.
> 
> [1]
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 85df25c36409..035e3f668d49 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3582,7 +3582,6 @@ int fib6_nh_init(struct net *net, struct fib6_nh 
> *fib6_nh,
>  netdevice_tracker *dev_tracker = &fib6_nh->fib_nh_dev_tracker;
>  struct net_device *dev = NULL;
>  struct inet6_dev *idev = NULL;
> - int addr_type;
>  int err;
>  
>  fib6_nh->fib_nh_family = AF_INET6;
> @@ -3624,11 +3623,10 @@ int fib6_nh_init(struct net *net, struct fib6_nh 
> *fib6_nh,
>  
>  fib6_nh->fib_nh_weight = 1;
>  
> - /* We cannot add true routes via loopback here,
> - * they would result in kernel looping; promote them to reject routes
> + /* Reset the nexthop device to the loopback device in case of reject
> + * routes.
>  */
> - addr_type = ipv6_addr_type(&cfg->fc_dst);
> - if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> + if (cfg->fc_flags & RTF_REJECT) {
>  /* hold loopback dev/idev if we haven't done so. */
>  if (dev != net->loopback_dev) {
>  if (dev) {
>

Thanks, this is indeed the simplest fix.

Let me walk through each case to confirm my understanding:

Case 1: Explicit reject route (with RTF_REJECT)
ip -6 route add unreachable 2001:db8:1::/64

cfg->fc_flags has RTF_REJECT before entering fib6_nh_init(), so the reject path 
is taken.
fib_nh_common_init() is skipped, nhc_pcpu_rth_output is not allocated. This is 
fine since reject
routes never need it.


Case 2: Loopback implicit reject route (without RTF_REJECT)
ip -6 route add 2001:db8::/32 dev lo

cfg->fc_flags does not have RTF_REJECT, so fib6_nh_init() takes the normal path 
and
fib_nh_common_init() allocates nhc_pcpu_rth_output. Later, 
ip6_route_info_create() calls
fib6_is_reject() and marks the route as RTF_REJECT.
The allocated nhc_pcpu_rth_output is unused but harmless.


Case 3: Standalone nexthop object (our bug scenario)
ip -6 nexthop add id 100 dev lo

ip route add 172.20.20.0/24 nhid 100
cfg->fc_flags does not have RTF_REJECT (nexthop objects never carry route 
attributes),
so fib6_nh_init() takes the normal path and fib_nh_common_init() allocates 
nhc_pcpu_rth_output.
This fixes the crash when an IPv4 route later references this nexthop.

Reply via email to