Hi,
On 2025-04-08 18:20, Bert Karwatzki wrote:
Am Dienstag, dem 08.04.2025 um 17:29 +0200 schrieb Thomas Gleixner:
On Tue, Apr 08 2025 at 17:09, Thomas Gleixner wrote:
On Tue, Apr 08 2025 at 14:04, Bert Karwatzki wrote:
Since linux-next-20250408 I get a NULL pointer dereference when booting:
[ T669] BUG: kernel NULL pointer dereference, address: 0000000000000330
[ T669] #PF: supervisor read access in kernel mode
[ T669] #PF: error_code(0x0000) - not-present page
[ T669] PGD 0 P4D 0
[ T669] Oops: Oops: 0000 [#1] SMP NOPTI
[ T669] CPU: 2 UID: 0 PID: 669 Comm: (udev-worker) Not tainted
6.15.0-rc1-next-20250408-master #788 PREEMPT_{RT,(lazy)}
[ T669] Hardware name: Micro-Star International Co., Ltd. Alpha 15
B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
[ T669] RIP: 0010:msi_domain_first_desc+0x4/0x30
[ T669] Code: e9 21 ff ff ff 0f 0b 31 c0 e9 f3 8c da ff 0f 1f 84 00 00 00 00 00 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b bf 68 02 00 00
48 85 ff 74 13 85 f6 75 0f 48 c7 47 60 00 00
[ T669] RSP: 0018:ffffcec6c25cfa78 EFLAGS: 00010246
[ T669] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008
[ T669] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c8
[ T669] RBP: ffff8d26cb419aec R08: 0000000000000228 R09: 0000000000000000
[ T669] R10: ffff8d26c516fdc0 R11: ffff8d26ca5a4aa0 R12: ffff8d26c1aed0c8
[ T669] R13: 0000000000000002 R14: ffffcec6c25cfa90 R15: ffff8d26c1aed000
[ T669] FS: 00007f690f71a980(0000) GS:ffff8d35e83fa000(0000)
knlGS:0000000000000000
[ T669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ T669] CR2: 0000000000000330 CR3: 0000000121b64000 CR4: 0000000000750ef0
[ T669] PKRU: 55555554
[ T669] Call Trace:
[ T669] <TASK>
[ T669] msix_setup_interrupts+0x23b/0x280
Can you please decode the lines please via:
scripts/faddr2line vmlinux msi_domain_first_desc+0x4/0x30
scripts/faddr2line vmlinux msix_setup_interrupts+0x23b/0x280
I had to recompile with CONFIG_DEBUG_INFO=Y, and reran the test, the calltrace
is identical.
$ scripts/faddr2line vmlinux msi_domain_first_desc+0x4/0x30
msi_domain_first_desc+0x4/0x30:
msi_domain_first_desc at kernel/irq/msi.c:400
So it seems msi_domain_first_desc() is called with dev = NULL.
$ scripts/faddr2line vmlinux msix_setup_interrupts+0x23b/0x280
msix_setup_interrupts+0x23b/0x280:
msix_update_entries at drivers/pci/msi/msi.c:647 (discriminator 1)
(inlined by) __msix_setup_interrupts at drivers/pci/msi/msi.c:684 (discriminator
1)
(inlined by) msix_setup_interrupts at drivers/pci/msi/msi.c:695 (discriminator
1)
I was also hit by this issue. If I understand it correctly, retain_ptr
inhibits the free to be inserted when the scope ends, but it also NULLs
dev in the process. If I switch the order of retain_ptr and
msix_update_entries in __msix_setup_interrupts I don't get the oops
anymore, though I don't know if this is the correct fix.
Can you please also provide kernel configuration and compiler version?
Thanks,
tglx
Bert Karwatzki
Regards,
Klara Modin.