Hi Nam,
I have been using an NVMe disk on my PowerPC system that supports up to
129 MSI-X interrupt vectors. Everything worked fine until Linux kernel
v6.18, after which the NVMe driver stopped detecting the disk because
the driver probe now fails.
After further investigation, I found that the probe failure in v6.18
occurs during PCI/MSI-X vector allocation. A git bisect identified
commit daaa574aba6f (“powerpc/pseries/msi: Switch to msi_create_parent_
irq_domain()”) as the first bad commit.
Additional debugging showed that the driver probe fails when calling
msi_create_device_irq_domain(). My working hypothesis is that, although
the PCIe NVMe device advertises support for 129 MSI-X vectors, the pSeries
firmware can supply only 128 MSI vectors to the device. This mismatch
appears to cause MSI-X domain creation to fail, which ultimately results
in the NVMe driver failing to probe the device.
Device & MSI-X capability:
==========================
# lspci
0524:28:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD
Controller CM7 2.5" (rev 01)
# lspci -vvv -s 0524:28:00.0 | grep -A2 MSI-X
Capabilities: [b0] MSI-X: Enable+ Count=129 Masked-
Vector table: BAR=0 offset=00005200
PBA: BAR=0 offset=0000d600
Relevant device tree excerpt (DTS):
pci@800000020000585 {
...
ibm,pe-total-#msi = <0x80>; /* 128 available under this PHB */
...
pci1014,6d1@0 {
...
ibm,msi-x-ranges = <0x1c 0x01>;
ibm,req#msi-x = <0x81>; /* device supports 0x81 == 129 */
...
}
}
As shown above, The device supports 0x81 (129) MSI-X vectors (ibm,req#msi-x),
but the PHB reports ibm,pe-total-#msi = 0x80 (128), indicating the
platform/firmware
provides only 128 MSI vectors for devices under that PHB.
Debugfs IRQ domain (on a kernel just before the bad commit):
===========================================================
# cat /sys/kernel/debug/irq/domains/:pci@800000020000524-3
name: :pci@800000020000524-3
size: 0
mapped: 65
flags: 0x00000013
IRQ_DOMAIN_FLAG_HIERARCHY
IRQ_DOMAIN_NAME_ALLOCATED
IRQ_DOMAIN_FLAG_MSI
parent: pSeries-MSI-1316
name: pSeries-MSI-1316
size: 128
mapped: 65
flags: 0x00000003
IRQ_DOMAIN_FLAG_HIERARCHY
IRQ_DOMAIN_NAME_ALLOCATED
parent: :interrupt-controller@400209f0000
...
This shows the parent domain (pSeries-MSI-1316) has size: 128.
>From this, it appears the pseries firmware or parent IRQ domain only
provides 128 MSI vectors to the device, though, the device could
support 129 MSI vectors. But then, the device eventually clamped the MSI
requests to 65 irq vectors and those were mapped successfully.
Debugfs IRQ domain (running the latest kernel):
===============================================
# cat /sys/kernel/debug/irq/domains/\:pci@800000020000524-5
name: :pci@800000020000524-5
size: 128
mapped: 0
flags: 0x00000103
IRQ_DOMAIN_FLAG_HIERARCHY
IRQ_DOMAIN_NAME_ALLOCATED
IRQ_DOMAIN_FLAG_MSI_PARENT
parent: :interrupt-controller@400209f0000
name: :interrupt-controller@400209f0000
size: 0
mapped: 135
flags: 0x00000003
IRQ_DOMAIN_FLAG_HIERARCHY
IRQ_DOMAIN_NAME_ALLOCATED
I do not see a per-device domain such as pSeries-PCI-MSI-0524:28:00.0 created;
and the device probe aborts with -22 during MSI/MSI-X allocation as shown below.
# dmesg | grep "nvme 0524:28:00.0"
[ 15.000370] nvme 0524:28:00.0: ibm,query-pe-dma-windows(53) 280000 8000000
20000524 returned 0, lb=1000000 ps=103 wn=1
[ 15.000772] nvme 0524:28:00.0: ibm,create-pe-dma-window(54) 280000 8000000
20000524 15 25 returned 0 (liobn = 0x70000524 starting addr = 8000000 0)
[ 15.010030] nvme 0524:28:00.0: lsa_required: 0, lsa_enabled: 0, direct
mapping: 1
[ 15.015637] nvme 0524:28:00.0: lsa_required: 0, lsa_enabled: 0, direct
mapping: 1
[ 15.021223] nvme 0524:28:00.0: enabling device (0140 -> 0142)
[ 15.028379] nvme 0524:28:00.0: probe with driver nvme failed with error -22
Summary / hypothesis:
=====================
- The adapter advertises 129 MSI-X vectors, but the PHB/firmware reports 128
available
MSI vectors for devices in that PCI subtree (ibm,pe-total-#msi = 0x80).
- After the daaa574aba6f change an allocation request for 129 vectors fails
when the
parent only has 128 slots. This leads to msi_create_device_irq_domain()
failing and
the NVMe driver probe aborting.
- Previously, the kernel ended up clamping the device’s request (to fewer
vectors — e.g., 65)
and probe succeeded; after the change the strict parent-domain allocation
prevents this
graceful fall-back.
Please let me know if you want an additional details to be captured.
Thanks,
--Nilay