The issue was introduced by commit 37f02195 ("powerpc/pci: fix PCI-e devices rescan issue on powerpc platform"). The field (struct pci_dev::irq) is reused by PCI core to trace the base MSI interrupt number if the MSI stuff is enabled on the corresponding device. When running to pcibios_setup_device(), we possibly still have enabled MSI interrupt on the device. That means "pci_dev->irq" still have the base MSI interrupt number and it will be overwritten if we're going fix "pci_dev->irq" again by pci_read_irq_line(). Eventually, when we enable the device, it runs to kernel crash caused by fetching the the MSI interrupt descriptor (struct msi_desc) from non-MSI interrupt and using the NULL descriptor.
The patch adds more check inside pcibios_setup_device() and don't fix the interrupt number if we already had MSI interrupt enabled on the device. Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000004177ac cpu 0x6: Vector: 300 (Data Access) at [c000000fa24b7690] pc: c0000000004177ac: .pci_restore_msi_state+0x30c/0x3b0 lr: c00000000041777c: .pci_restore_msi_state+0x2dc/0x3b0 sp: c000000fa24b7910 msr: 8000000000009032 dar: 8 dsisr: 40000000 current = 0xc000000fb68542c0 paca = 0xc00000000ecd1500 softe: 0 irq_happened: 0x00 pid = 5367, comm = eehd enter ? for help [c000000fa24b79b0] c000000000405d2c .pci_restore_state.part.27+0x11c/0x2a0 [c000000fa24b7a40] c0000000005ea128 .e1000_io_slot_reset+0xa8/0x230 [c000000fa24b7ad0] c00000000005fcd4 .eeh_report_reset+0x94/0x120 [c000000fa24b7b60] c00000000005e97c .eeh_pe_dev_traverse+0x9c/0x190 [c000000fa24b7c10] c000000000060078 .eeh_handle_event+0x218/0x330 [c000000fa24b7ca0] c0000000000602c0 .eeh_event_handler+0x130/0x1a0 [c000000fa24b7d30] c0000000000ad6f8 .kthread+0xe8/0xf0 [c000000fa24b7e30] c00000000000a05c .ret_from_kernel_thread+0x5c/0x80 Reported-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> Signed-off-by: Gavin Shan <sha...@linux.vnet.ibm.com> --- arch/powerpc/kernel/pci-common.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index eabeec9..d3a00e8 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1009,10 +1009,18 @@ void pcibios_setup_device(struct pci_dev *dev) if (ppc_md.pci_dma_dev_setup) ppc_md.pci_dma_dev_setup(dev); - /* Read default IRQs and fixup if necessary */ - pci_read_irq_line(dev); - if (ppc_md.pci_irq_fixup) - ppc_md.pci_irq_fixup(dev); + /* + * Read default IRQs and fixup if necessary. We probably + * has MSI interrupt enabled on the device and that hasn't + * been unloaded yet. For that case, "dev->irq" is tracing + * the base MSI interrupt number and it's going to overrite + * the MSI interrupt number to fix "dev->irq" here. + */ + if (!dev->msi_enabled) { + pci_read_irq_line(dev); + if (ppc_md.pci_irq_fixup) + ppc_md.pci_irq_fixup(dev); + } } void pcibios_setup_bus_devices(struct pci_bus *bus) -- 1.7.5.4 _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev