Hi,

On 2025-04-08 18:20, Bert Karwatzki wrote:
Am Dienstag, dem 08.04.2025 um 17:29 +0200 schrieb Thomas Gleixner:
On Tue, Apr 08 2025 at 17:09, Thomas Gleixner wrote:
On Tue, Apr 08 2025 at 14:04, Bert Karwatzki wrote:
Since linux-next-20250408 I get a NULL pointer dereference when booting:

[  T669] BUG: kernel NULL pointer dereference, address: 0000000000000330
[  T669] #PF: supervisor read access in kernel mode
[  T669] #PF: error_code(0x0000) - not-present page
[  T669] PGD 0 P4D 0
[  T669] Oops: Oops: 0000 [#1] SMP NOPTI
[  T669] CPU: 2 UID: 0 PID: 669 Comm: (udev-worker) Not tainted 
6.15.0-rc1-next-20250408-master #788 PREEMPT_{RT,(lazy)}
[  T669] Hardware name: Micro-Star International Co., Ltd. Alpha 15 
B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
[  T669] RIP: 0010:msi_domain_first_desc+0x4/0x30
[  T669] Code: e9 21 ff ff ff 0f 0b 31 c0 e9 f3 8c da ff 0f 1f 84 00 00 00 00 00 90 
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b bf 68 02 00 00 
48 85 ff 74 13 85 f6 75 0f 48 c7 47 60 00 00
[  T669] RSP: 0018:ffffcec6c25cfa78 EFLAGS: 00010246
[  T669] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008
[  T669] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c8
[  T669] RBP: ffff8d26cb419aec R08: 0000000000000228 R09: 0000000000000000
[  T669] R10: ffff8d26c516fdc0 R11: ffff8d26ca5a4aa0 R12: ffff8d26c1aed0c8
[  T669] R13: 0000000000000002 R14: ffffcec6c25cfa90 R15: ffff8d26c1aed000
[  T669] FS:  00007f690f71a980(0000) GS:ffff8d35e83fa000(0000) 
knlGS:0000000000000000
[  T669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T669] CR2: 0000000000000330 CR3: 0000000121b64000 CR4: 0000000000750ef0
[  T669] PKRU: 55555554
[  T669] Call Trace:
[  T669]  <TASK>
[  T669]  msix_setup_interrupts+0x23b/0x280

Can you please decode the lines please via:

     scripts/faddr2line vmlinux msi_domain_first_desc+0x4/0x30
     scripts/faddr2line vmlinux msix_setup_interrupts+0x23b/0x280


I had to recompile with CONFIG_DEBUG_INFO=Y, and reran the test, the calltrace
is identical.

$ scripts/faddr2line vmlinux msi_domain_first_desc+0x4/0x30
msi_domain_first_desc+0x4/0x30:
msi_domain_first_desc at kernel/irq/msi.c:400

So it seems msi_domain_first_desc() is called with dev = NULL.

$ scripts/faddr2line vmlinux msix_setup_interrupts+0x23b/0x280
msix_setup_interrupts+0x23b/0x280:
msix_update_entries at drivers/pci/msi/msi.c:647 (discriminator 1)
(inlined by) __msix_setup_interrupts at drivers/pci/msi/msi.c:684 (discriminator
1)
(inlined by) msix_setup_interrupts at drivers/pci/msi/msi.c:695 (discriminator
1)


I was also hit by this issue. If I understand it correctly, retain_ptr inhibits the free to be inserted when the scope ends, but it also NULLs dev in the process. If I switch the order of retain_ptr and msix_update_entries in __msix_setup_interrupts I don't get the oops anymore, though I don't know if this is the correct fix.


Can you please also provide kernel configuration and compiler version?

Thanks,

         tglx

Bert Karwatzki


Regards,
Klara Modin.

Reply via email to