syzkaller reported a kernel panic [1] with the following crash stack:

Call Trace:
BUG: unable to handle page fault for address: ffff8ebd08580000
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
PGD 11f201067 P4D 11f201067 PUD 0
Oops: Oops: 0002 [#1] SMP PTI
CPU: 2 UID: 0 PID: 451 Comm: test_progs Not tainted 6.19.0+ #161 PREEMPT_RT
RIP: 0010:bond_rr_gen_slave_id+0x90/0xd0
RSP: 0018:ffffd3f4815f3448 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff8ebc8728b17e
RDX: 0000000000000000 RSI: ffffd3f4815f3538 RDI: ffff8ebc8abcce40
RBP: ffffd3f4815f3460 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffd3f4815f3538
R13: ffff8ebc8abcce40 R14: ffff8ebc8728b17f R15: ffff8ebc8728b170
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8ebd08580000 CR3: 000000010a808006 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 bond_xdp_get_xmit_slave+0xc0/0x240
 xdp_master_redirect+0x74/0xc0
 bpf_prog_run_generic_xdp+0x2f2/0x3f0
 do_xdp_generic+0x1fd/0x3d0
 __netif_receive_skb_core.constprop.0+0x30d/0x1220
 __netif_receive_skb_list_core+0xfc/0x250
 netif_receive_skb_list_internal+0x20c/0x3d0
 ? eth_type_trans+0x137/0x160
 netif_receive_skb_list+0x25/0x140
 xdp_test_run_batch.constprop.0+0x65b/0x6e0
 bpf_test_run_xdp_live+0x1ec/0x3b0
 bpf_prog_test_run_xdp+0x49d/0x6e0
 __sys_bpf+0x446/0x27b0
 __x64_sys_bpf+0x1a/0x30
 x64_sys_call+0x146c/0x26e0
 do_syscall_64+0xd3/0x1510
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Problem Description

This issue occurs when the following conditions are met:

1. A bond device is in round-robin mode but has never been brought UP
 (bond_open() was never called)
 - rr_tx_counter is only allocated in bond_open()

2. bpf_master_redirect_enabled_key is a global static key
 - When any bond device attaches native XDP, this key is globally enabled
 - It affects XDP processing for ALL bond slaves system-wide

3. The XDP redirect data path can reach bond_rr_gen_slave_id()
 - Via: xdp_master_redirect()->bond_xdp_get_xmit_slave()->bond_rr_gen_slave_id()
 - bond_rr_gen_slave_id() directly dereferences rr_tx_counter without NULL check
 - When the bond is not UP, rr_tx_counter is NULL, causing a null-ptr-deref 
crash

Solution

Patch 1: Add netif_running() check in xdp_master_redirect() to verify the master
           device is in the running state before proceeding with the redirect.

Patch 2: Add a selftest that reproduces the above scenario and verifies the fix.

[1] https://syzkaller.appspot.com/bug?extid=80e046b8da2820b6ba73

Jiayuan Chen (2):
  net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding
  selftests/bpf: add test for xdp_master_redirect with bond not up

 net/core/filter.c                             |   3 +
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 101 +++++++++++++++++-
 2 files changed, 102 insertions(+), 2 deletions(-)

-- 
2.43.0


Reply via email to