Control: forwarded -1 https://lore.kernel.org/lkml/abE_QoS5DM-ZltaV@monoceros
#regzbot introduced: a60b990798eb17433d0283788280422b1bd94b18 #regzbot from: "Aaron D. Johnson" <[email protected]> #regzbot monitor: https://bugs.debian.org/1127635 Hello, On Sat, Dec 14, 2024 at 12:50:18PM +0100, Thomas Gleixner wrote: > Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a > RISCV platform which does not provide PCI/MSI support: > > WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 > pci_msi_setup_msi_irqs+0x2c/0x32 > __pci_enable_msix_range+0x30c/0x596 > pci_msi_setup_msi_irqs+0x2c/0x32 > pci_alloc_irq_vectors_affinity+0xb8/0xe2 > > RISCV uses hierarchical interrupt domains and correctly does not implement > the legacy fallback. The warning triggers from the legacy fallback stub. > > That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent > domain is associated with the device or not. There is a check for MSI-X, > which has a legacy assumption. But that legacy fallback assumption is only > valid when legacy support is enabled, but otherwise the check should simply > return -ENOTSUPP. > > Loongarch tripped over the same problem and blindly enabled legacy support > without implementing the legacy fallbacks. There are weak implementations > which return an error, so the problem was papered over. > > Correct pci_msi_domain_supports() to evaluate the legacy mode and add > the missing supported check into the MSI enable path to complete it. > > Fixes: d2a463b29741 ("PCI/MSI: Reject multi-MSI early") > Reported-by: Alexandre Ghiti <[email protected]> > Signed-off-by: Thomas Gleixner <[email protected]> > Tested-by: Alexandre Ghiti <[email protected]> > Cc: [email protected] this patch became a60b990798eb17433d0283788280422b1bd94b18 in v6.13-rc5 and was backported to 6.12.y and 6.6.y (aed157301c65 and b1f7476e07b9 respectively). A Debian user (Aaron, on Cc:) on powerpc has boot problems and bisected them to this commit. The relevant boot log of the failure is: [ 2.643879] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 2.643891] Faulting instruction address: 0xc000000000a39514 [ 2.643902] Oops: Kernel access of bad area, sig: 11 [#1] [ 2.643909] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 2.643920] Modules linked in: ohci_pci(+) ehci_hcd nvme_fabrics ohci_hcd nvme_keyring nvme_core usbcore nvme_auth scsi_transport_fc ipr configfs ehea(+) usb_common [ 2.643965] CPU: 5 UID: 0 PID: 250 Comm: (udev-worker) Not tainted 6.12.17-powerpc64 #1 Debian 6.12.17-1 [ 2.643976] Hardware name: IBM,8204-E8A POWER6 (architected) 0x3e0302 0xf000002 of:IBM,EL350_118 hv:phyp pSeries [ 2.643986] NIP: c000000000a39514 LR: c000000000a36ed8 CTR: c000000000a35820 [ 2.643995] REGS: c0000000351f6f60 TRAP: 0300 Not tainted (6.12.17-powerpc64 Debian 6.12.17-1) [ 2.644004] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24222288 XER: 00000000 [ 2.644031] CFAR: c00000000000cfc4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 [ 2.644031] GPR00: c000000000a36ed8 c0000000351f7200 c00000000182e200 c0000003df294000 [ 2.644031] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2.644031] GPR08: 0000000000000001 0000000000000000 c00000000228fcc0 0000000044222288 [ 2.644031] GPR12: c000000000a35820 c00000000eeacb00 0000000000000020 0000010037fcab20 [ 2.644031] GPR16: 0000000022222248 0000000000020000 0000000000000000 00003fffebe8bb80 [ 2.644031] GPR20: 0000000000000000 c00000000204db60 c00000000204dd60 c00000000b1ae780 [ 2.644031] GPR24: 0000000000000000 00003fff8c9ac758 0000000000000000 c0000003df294000 [ 2.644031] GPR28: 0000000000000001 0000000000000000 c0000003df294000 0000000000000001 [ 2.644164] NIP [c000000000a39514] pci_msi_domain_supports (drivers/pci/msi/irqdomain.c:366) [ 2.644181] LR [c000000000a36ed8] __pci_enable_msi_range (drivers/pci/msi/msi.c:437) [ 2.644192] Call Trace: [ 2.644197] [c0000000351f7200] [c0000000351f7304] 0xc0000000351f7304 (unreliable) [ 2.644211] [c0000000351f7340] [c000000000a3578c] pci_alloc_irq_vectors_affinity (drivers/pci/msi/api.c:277) [ 2.644225] [c0000000351f73d0] [c0003d0007d2f4d4] usb_hcd_pci_probe (drivers/usb/core/hcd-pci.c:192) usbcore [ 2.644246] [c0000000351f7470] [c0003d00084e6030] ohci_pci_probe (drivers/usb/host/ohci-pci.c:285) ohci_pci [ 2.644260] [c0000000351f7490] [c000000000a260e8] local_pci_probe (drivers/pci/pci-driver.c:324) [ 2.644274] [c0000000351f7510] [c000000000a26218] pci_call_probe (drivers/pci/pci-driver.c:392 (discriminator 1)) [ 2.644287] [c0000000351f7670] [c000000000a27348] pci_device_probe (drivers/pci/pci-driver.c:452) [ 2.644300] [c0000000351f76b0] [c000000000b2e658] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658) [ 2.644314] [c0000000351f7740] [c000000000b2eb24] __driver_probe_device (drivers/base/dd.c:800) [ 2.644327] [c0000000351f77c0] [c000000000b2edc4] driver_probe_device (drivers/base/dd.c:831) [ 2.644340] [c0000000351f7800] [c000000000b2f188] __driver_attach (drivers/base/dd.c:1217) [ 2.644352] [c0000000351f7880] [c000000000b2ac64] bus_for_each_dev (drivers/base/bus.c:370) [ 2.644365] [c0000000351f78e0] [c000000000b2dac4] driver_attach (drivers/base/dd.c:1234) [ 2.644377] [c0000000351f7900] [c000000000b2cd98] bus_add_driver (drivers/base/bus.c:675) [ 2.644389] [c0000000351f7990] [c000000000b30ae4] driver_register (drivers/base/driver.c:246) [ 2.644402] [c0000000351f7a00] [c000000000a24f88] __pci_register_driver (drivers/pci/pci-driver.c:1450) [ 2.644415] [c0000000351f7a20] [c0003d00084e6800] ohci_pci_init (drivers/usb/host/ohci-pci.c:308) ohci_pci [ 2.644429] [c0000000351f7a50] [c00000000000fd60] do_one_initcall (init/main.c:1269) [ 2.644444] [c0000000351f7b30] [c0000000002760f8] do_init_module (kernel/module/main.c:2543) [ 2.644460] [c0000000351f7bb0] [c000000000278fe4] init_module_from_file (kernel/module/main.c:3199) [ 2.644473] [c0000000351f7c90] [c0000000002793e0] sys_finit_module (kernel/module/main.c:3211 kernel/module/main.c:3238 kernel/module/main.c:3221) [ 2.644487] [c0000000351f7da0] [c00000000002c084] system_call_exception (arch/powerpc/kernel/syscall.c:171) [ 2.644500] [c0000000351f7e50] [c00000000000cb54] system_call_common (arch/powerpc/kernel/interrupt_64.S:292) [ 2.644515] --- interrupt: c00 at 0x3fff8d653d8c [ 2.644522] NIP: 00003fff8d653d8c LR: 00003fff8c9a4680 CTR: 0000000000000000 [ 2.644531] REGS: c0000000351f7e80 TRAP: 0c00 Not tainted (6.12.17-powerpc64 Debian 6.12.17-1) [ 2.644541] MSR: 800000000200f032 <SF,VEC,EE,PR,FP,ME,IR,DR,RI> CR: 22222222 XER: 00000000 [ 2.644573] IRQMASK: 0 [ 2.644573] GPR00: 0000000000000161 00003fffebe8b640 00003fff8d757100 0000000000000052 [ 2.644573] GPR04: 00003fff8c9ac758 0000000000000004 0000000000000058 000000000000005a [ 2.644573] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2.644573] GPR12: 0000000000000000 00003fff8de947c0 0000000000000020 0000010037fcab20 [ 2.644573] GPR16: 0000000022222248 0000000000020000 0000000000000000 00003fffebe8bb80 [ 2.644573] GPR20: 0000000000000000 00003fffebe8bb70 0000000000000007 0000010037fca210 [ 2.644573] GPR24: 0000000000000000 0000000000000000 0000010037f6be40 0000000000000004 [ 2.644573] GPR28: 00003fff8c9ac758 0000000000020000 0000000000000004 0000010037fca210 [ 2.644698] NIP [00003fff8d653d8c] 0x3fff8d653d8c [ 2.644705] LR [00003fff8c9a4680] 0x3fff8c9a4680 [ 2.644713] --- interrupt: c00 [ 2.644719] Code: 4182002c e92a0088 80690000 7c632038 7c632278 7c630034 5463d97e 786307e0 4e800020 60000000 60000000 e92a0020 <80690000> 4bffffd8 60000000 7ca50034 All code ======== 0:* 41 82 00 2c beq 0x2c <-- trapping instruction 4: e9 2a 00 88 ld r9,136(r10) 8: 80 69 00 00 lwz r3,0(r9) c: 7c 63 20 38 and r3,r3,r4 10: 7c 63 22 78 xor r3,r3,r4 14: 7c 63 00 34 cntlzw r3,r3 18: 54 63 d9 7e srwi r3,r3,5 1c: 78 63 07 e0 clrldi r3,r3,63 20: 4e 80 00 20 blr 24: 60 00 00 00 nop 28: 60 00 00 00 nop 2c: e9 2a 00 20 ld r9,32(r10) 30: 80 69 00 00 lwz r3,0(r9) 34: 4b ff ff d8 b 0xc 38: 60 00 00 00 nop 3c: 7c a5 00 34 cntlzw r5,r5 Code starting with the faulting instruction =========================================== 0: 80 69 00 00 lwz r3,0(r9) 4: 4b ff ff d8 b 0xffffffffffffffdc 8: 60 00 00 00 nop c: 7c a5 00 34 cntlzw r5,r5 [ 2.644769] ---[ end trace 0000000000000000 ]--- (That's the bug splat from the bug report piped through scripts/decode_stacktrace.sh) The kernel has CONFIG_PCI_MSI_ARCH_FALLBACKS=y, so the first hunk shouldn't change anything. The disassembly of pci_msi_domain_supports in the kernel looks as follows: c000000000a394c0 <pci_msi_domain_supports>: pci_msi_domain_supports(): debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:334 c000000000a394c0: 60 00 00 00 nop c000000000a394c4: 60 00 00 00 nop debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353 c000000000a394c8: e9 43 02 e8 ld r10,744(r3) c000000000a394cc: 2c 2a 00 00 cmpdi r10,0 c000000000a394d0: 41 82 00 50 beq c000000000a39520 <pci_msi_domain_supports+0x60> irq_domain_is_hierarchy(): debian/build/build_powerpc_none_powerpc64/include/linux/irqdomain.h:661 c000000000a394d4: 81 2a 00 28 lwz r9,40(r10) pci_msi_domain_supports(): debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353 (discriminator 1) c000000000a394d8: 71 28 00 01 andi. r8,r9,1 c000000000a394dc: 41 82 00 44 beq c000000000a39520 <pci_msi_domain_supports+0x60> debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:359 (discriminator 1) c000000000a394e0: 71 29 01 00 andi. r9,r9,256 c000000000a394e4: 41 82 00 2c beq c000000000a39510 <pci_msi_domain_supports+0x50> debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:375 c000000000a394e8: e9 2a 00 88 ld r9,136(r10) c000000000a394ec: 80 69 00 00 lwz r3,0(r9) debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:378 c000000000a394f0: 7c 63 20 38 and r3,r3,r4 c000000000a394f4: 7c 63 22 78 xor r3,r3,r4 c000000000a394f8: 7c 63 00 34 cntlzw r3,r3 c000000000a394fc: 54 63 d9 7e srwi r3,r3,5 debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379 c000000000a39500: 78 63 07 e0 clrldi r3,r3,63 c000000000a39504: 4e 80 00 20 blr c000000000a39508: 60 00 00 00 nop c000000000a3950c: 60 00 00 00 nop debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:366 c000000000a39510: e9 2a 00 20 ld r9,32(r10) c000000000a39514: 80 69 00 00 lwz r3,0(r9) c000000000a39518: 4b ff ff d8 b c000000000a394f0 <pci_msi_domain_supports+0x30> c000000000a3951c: 60 00 00 00 nop debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:355 c000000000a39520: 7c a5 00 34 cntlzw r5,r5 c000000000a39524: 54 a3 d9 7e srwi r3,r5,5 debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379 c000000000a39528: 78 63 07 e0 clrldi r3,r3,63 c000000000a3952c: 4e 80 00 20 blr so the trapping happens in drivers/pci/msi/irqdomain.c:366 which is: 365 info = domain->host_data; 366 supported = info->flags; According to the register dump domain == r10 == NULL, but then this code would not have been reached and the faulting instruction would be at c000000000a39510. So maybe it's only .host_data = NULL and the register dump is unreliable?? The offsets match: .host_data is at offset 32 of struct irq_domain and .flags is at offset 0 of struct msi_domain_info. For more details see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1127635 . Does someone spot the issue? Best regards Uwe
signature.asc
Description: PGP signature

