Hi Nam,

I have been using an NVMe disk on my PowerPC system that supports up to
129 MSI-X interrupt vectors. Everything worked fine until Linux kernel
v6.18, after which the NVMe driver stopped detecting the disk because
the driver probe now fails.

After further investigation, I found that the probe failure in v6.18
occurs during PCI/MSI-X vector allocation. A git bisect identified
commit daaa574aba6f (“powerpc/pseries/msi: Switch to msi_create_parent_
irq_domain()”) as the first bad commit.

Additional debugging showed that the driver probe fails when calling
msi_create_device_irq_domain(). My working hypothesis is that, although
the PCIe NVMe device advertises support for 129 MSI-X vectors, the pSeries
firmware can supply only 128 MSI vectors to the device. This mismatch 
appears to cause MSI-X domain creation to fail, which ultimately results
in the NVMe driver failing to probe the device.

Device & MSI-X capability:
==========================

# lspci 
0524:28:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD 
Controller CM7 2.5" (rev 01)

# lspci -vvv -s 0524:28:00.0 | grep -A2 MSI-X
        Capabilities: [b0] MSI-X: Enable+ Count=129 Masked-
                Vector table: BAR=0 offset=00005200
                PBA: BAR=0 offset=0000d600

Relevant device tree excerpt (DTS):

pci@800000020000585 {
    ...
    ibm,pe-total-#msi = <0x80>;            /* 128 available under this PHB */
    ...
    pci1014,6d1@0 {
        ...
        ibm,msi-x-ranges = <0x1c 0x01>;
        ibm,req#msi-x        = <0x81>;     /* device supports 0x81 == 129 */
        ...
    }
}

As shown above, The device supports 0x81 (129) MSI-X vectors (ibm,req#msi-x),
but the PHB reports ibm,pe-total-#msi = 0x80 (128), indicating the 
platform/firmware
provides only 128 MSI vectors for devices under that PHB.

Debugfs IRQ domain (on a kernel just before the bad commit):
===========================================================

# cat /sys/kernel/debug/irq/domains/:pci@800000020000524-3
name:   :pci@800000020000524-3
 size:   0
 mapped: 65
 flags:  0x00000013
    IRQ_DOMAIN_FLAG_HIERARCHY
    IRQ_DOMAIN_NAME_ALLOCATED
    IRQ_DOMAIN_FLAG_MSI
 parent: pSeries-MSI-1316
    name:   pSeries-MSI-1316
     size:   128
     mapped: 65
     flags:  0x00000003
        IRQ_DOMAIN_FLAG_HIERARCHY
        IRQ_DOMAIN_NAME_ALLOCATED
     parent: :interrupt-controller@400209f0000
        ...

This shows the parent domain (pSeries-MSI-1316) has size: 128.
>From this, it appears the pseries firmware or parent IRQ domain only
provides 128 MSI vectors to the device, though, the device could
support 129 MSI vectors. But then, the device eventually clamped the MSI 
requests to 65 irq vectors and those were mapped successfully. 

Debugfs IRQ domain (running the latest kernel):
===============================================

# cat   /sys/kernel/debug/irq/domains/\:pci@800000020000524-5 
name:   :pci@800000020000524-5
 size:   128
 mapped: 0
 flags:  0x00000103
            IRQ_DOMAIN_FLAG_HIERARCHY
            IRQ_DOMAIN_NAME_ALLOCATED
            IRQ_DOMAIN_FLAG_MSI_PARENT
 parent: :interrupt-controller@400209f0000
    name:   :interrupt-controller@400209f0000
     size:   0
     mapped: 135
     flags:  0x00000003
                IRQ_DOMAIN_FLAG_HIERARCHY
                IRQ_DOMAIN_NAME_ALLOCATED

I do not see a per-device domain such as pSeries-PCI-MSI-0524:28:00.0 created;
and the device probe aborts with -22 during MSI/MSI-X allocation as shown below.

# dmesg | grep "nvme 0524:28:00.0"
[   15.000370] nvme 0524:28:00.0: ibm,query-pe-dma-windows(53) 280000 8000000 
20000524 returned 0, lb=1000000 ps=103 wn=1
[   15.000772] nvme 0524:28:00.0: ibm,create-pe-dma-window(54) 280000 8000000 
20000524 15 25 returned 0 (liobn = 0x70000524 starting addr = 8000000 0)
[   15.010030] nvme 0524:28:00.0: lsa_required: 0, lsa_enabled: 0, direct 
mapping: 1
[   15.015637] nvme 0524:28:00.0: lsa_required: 0, lsa_enabled: 0, direct 
mapping: 1
[   15.021223] nvme 0524:28:00.0: enabling device (0140 -> 0142)
[   15.028379] nvme 0524:28:00.0: probe with driver nvme failed with error -22


Summary / hypothesis:
=====================
- The adapter advertises 129 MSI-X vectors, but the PHB/firmware reports 128 
available
  MSI vectors for devices in that PCI subtree (ibm,pe-total-#msi = 0x80).

- After the daaa574aba6f change an allocation request for 129 vectors fails 
when the
  parent only has 128 slots. This leads to msi_create_device_irq_domain() 
failing and
  the NVMe driver probe aborting.

- Previously, the kernel ended up clamping the device’s request (to fewer 
vectors — e.g., 65)
  and probe succeeded; after the change the strict parent-domain allocation 
prevents this
  graceful fall-back.

Please let me know if you want an additional details to be captured.

Thanks,
--Nilay

Reply via email to