On 9/16/24 12:32, Przemek Kitszel wrote:
On 9/14/24 07:27, Ben Greear wrote:
Hello,
We found this during a long duration network test where we are using
lots of wifi network devices in a single system, talking with
It will be really hard to repro for us. Still would like to help.
an intel 10g
It's more likely to get Intel's help if you mail (also) to our IWL list
(CCed, +Aleksandr for ixgbe expertise).
NIC in the same system (using vrfs and such). The system ran around
7 hours before it crashed. Seems to be a null pointer in a list, but
I'm not having great luck understanding where exactly in the large
tcp_ack
method this is happening. Any suggestions for how to get more relevant
info out of gdb?
I would also enable kmemleak, lockdep, ubsan to get some easy helpers.
BUG: kernel NULL pointer dereference, address: 0000000000000008^M
#PF: supervisor write access in kernel mode^M
could you share your virtualization config?
#PF: error_code(0x0002) - not-present page^M
PGD 115855067 P4D 115855067 PUD 283ed3067 PMD 0 ^M
Oops: Oops: 0002 [#1] PREEMPT SMP^M
CPU: 6 PID: 115673 Comm: btserver Tainted: G O 6.10.3+
so, what hacks do you have? those are to aid debugging or to enable some
of the wifi devices?
I don't have any insightful comment unfortunately, sorry.
#57^M
Hardware name: Default string Default string/SKYBAY, BIOS 5.12
08/04/2020^M
RIP: 0010:tcp_ack+0x62e/0x1530^M
Code: 9c 24 80 05 00 00 0f 84 56 09 00 00 49 39 9c 24 50 06 00 00 0f
84 b2 04 00 00 48 8b 53 58 48 8b 43 60 48 89 df 48 8b 74 24 28 <48> 89
42 08 48 89 10 48 c7 43 60 00 00 00 00 48 c7 43 58 00 00 00^M
RSP: 0018:ffffc9000027c998 EFLAGS: 00010207^M
RAX: 0000000000000000 RBX: ffff8881226a8800 RCX: ffff8881226abe01^M
RDX: 0000000000000000 RSI: ffff888126a3d4c8 RDI: ffff8881226a8800^M
RBP: ffffc9000027ca28 R08: 000000000005edf6 R09: 0000000000000000^M
R10: 0000000000000008 R11: 0000000084d9074f R12: ffff888126a3d340^M
R13: 0000000000000004 R14: ffff8881226aac00 R15: 0000000000000000^M
FS: 00007efc82a2f7c0(0000) GS:ffff88845dd80000(0000)
knlGS:0000000000000000^M
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 0000000000000008 CR3: 0000000125477006 CR4: 00000000003706f0^M
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M
Call Trace:^M
<IRQ>^M
? __die+0x1a/0x60^M
? page_fault_oops+0x150/0x500^M
? exc_page_fault+0x6f/0x160^M
? asm_exc_page_fault+0x22/0x30^M
? tcp_ack+0x62e/0x1530^M
? tcp_ack+0x5f1/0x1530^M
? tcp_schedule_loss_probe+0x101/0x1d0^M
tcp_rcv_established+0x168/0x750^M
tcp_v4_do_rcv+0x13f/0x270^M
tcp_v4_rcv+0x1236/0x15f0^M
? udp_lib_lport_inuse+0x100/0x100^M
? raw_local_deliver+0xc8/0x250^M
ip_protocol_deliver_rcu+0x1b/0x290^M
ip_local_deliver_finish+0x6d/0x90^M
ip_sublist_rcv_finish+0x2d/0x40^M
ip_sublist_rcv+0x160/0x200^M
? __netif_receive_skb_core.constprop.0+0x30d/0xf80^M
ip_list_rcv+0xca/0x120^M
__netif_receive_skb_list_core+0x17f/0x1e0^M
netif_receive_skb_list_internal+0x1c5/0x290^M
napi_complete_done+0x69/0x180^M
ixgbe_poll+0xd93/0x13d0 [ixgbe]^M
__napi_poll+0x20/0x1a0^M
net_rx_action+0x2af/0x310^M
handle_softirqs+0xc8/0x2b0^M
__irq_exit_rcu+0x5f/0x80^M
common_interrupt+0x81/0xa0^M
</IRQ>^M
(gdb) l *(tcp_ack+0x62e)
0xffffffff81c8601e is in tcp_ack (/home/greearb/git/linux-6.10.dev.y/
include/linux/list.h:195).
190 * This is only for internal list manipulation where we know
191 * the prev/next entries already!
192 */
193 static inline void __list_del(struct list_head * prev, struct
list_head * next)
194 {
195 next->prev = prev;
196 WRITE_ONCE(prev->next, next);
197 }
198
199 /*
(gdb) l *(tcp_rcv_established+0x168)
0xffffffff81c88b88 is in tcp_rcv_established (/home/greearb/git/
linux-6.10.dev.y/net/ipv4/tcp_input.c:6209).
6204
6205 if (!tcp_validate_incoming(sk, skb, th, 1))
6206 return;
6207
6208 step5:
6209 reason = tcp_ack(sk, skb, FLAG_SLOWPATH |
FLAG_UPDATE_TS_RECENT);
6210 if ((int)reason < 0) {
6211 reason = -reason;
6212 goto discard;
6213 }
(gdb)
Thanks,
Ben