Hello, I'm trying to configure my Infiniband cards to pass them to VMs using SR-IOV. Unfortunately my only PCIe x16 slot seems to share iommu group with a PCI bridge. And this stops kernel from letting IB virtual functions to pass to the kernel. I tried many options but none of them worked. Let me describe what I did, probably you can give me an advice how to work around the issue.
Here is how the problem manifests in the first place: $ sudo virsh start mvapichVM.2.1 error: Failed to start domain mvapichVM.2.1 error: internal error: qemu unexpectedly closed the monitor: ... qemu-system-x86_64: -device vfio-pci,host=01:00.1,id=hostdev0,bus=pci.0,addr=0xa: vfio error: 0000:01:00.1: group 1 is not viable The key error is this: vfio error: 0000:01:00.1: group 1 is not viable I'm checking which devices are in the group 1: $ find /sys/kernel/iommu_groups/ -type l | grep $(lspci | grep Mellanox | tail -n 1 | cut -c1-2) /sys/kernel/iommu_groups/1/devices/0000:01:00.3 /sys/kernel/iommu_groups/1/devices/0000:01:00.1 /sys/kernel/iommu_groups/1/devices/0000:01:00.4 /sys/kernel/iommu_groups/1/devices/0000:00:01.0 /sys/kernel/iommu_groups/1/devices/0000:01:00.2 /sys/kernel/iommu_groups/1/devices/0000:01:00.0 What we see here are an IB card with 1 physical function and 4 virtual functions, and a PCI bridge. Here is an excerpt from lspci: # lspci -s 00:01 -vnn 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 26 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Memory behind bridge: f0000000-f09fffff Prefetchable memory behind bridge: 00000000c0000000-00000000c3ffffff Capabilities: [88] Subsystem: Gigabyte Technology Co., Ltd Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [1458:5000] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Capabilities: [100] Virtual Channel Capabilities: [140] Root Complex Link Capabilities: [d94] #19 Kernel driver in use: pcieport Kernel modules: shpchp # lspci -s 01:00.0 -vnn 01:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003] Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:0050] Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f0900000 (64-bit, non-prefetchable) [size=1M] Memory at f0000000 (64-bit, prefetchable) [size=8M] Expansion ROM at f0800000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=128 Masked- Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [100] Alternative Routing-ID Interpretation (ARI) Capabilities: [148] Device Serial Number f4-52-14-03-00-10-a4-e0 Capabilities: [154] Advanced Error Reporting Capabilities: [18c] #19 Capabilities: [108] Single Root I/O Virtualization (SR-IOV) Kernel driver in use: mlx4_core Kernel modules: mlx4_core # lspci -s 01:00.1 -vnn 01:00.1 Network controller [0280]: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:1004] Subsystem: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:61b0] Flags: fast devsel [virtual] Memory at c0000000 (64-bit, prefetchable) [size=8M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [9c] MSI-X: Enable- Count=220 Masked- Capabilities: [40] Power Management version 0 Kernel driver in use: vfio-pci Kernel modules: mlx4_core So I tried to circumvent this by compiling kernel with VFIO_NOIOMMU parameter (see this patch: https://lkml.org/lkml/2015/12/22/541). And also I tried to apply pcie_acs_override patch. I boot the kernel with pcie_acs_override=downstream,multifunction additionally. But nothing change the iommu group assignment. Some diagnostics from dmesg. These two lines appear during boot, but nothing similar appears for the 0000:00:01 device. [ 0.692871] pci 0000:00:1c.2: Intel PCH root port ACS workaround enabled [ 0.692567] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled Hardware details: # lspci -t -[0000:00]-+-00.0 +-01.0-[01]--+-00.0 | +-00.1 | +-00.2 | +-00.3 | \-00.4 +-02.0 ... Motherboard: Base Board Information Manufacturer: Gigabyte Technology Co., Ltd. Product Name: Z87-HD3 CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Infiniband card: Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] Full dmesg: https://pastebin.com/c3XrJ7Vu Very verbose lspci: https://pastebin.com/FgiDJ9M3 Could you tell me if it is possible at all to break this group? If yes how can i do this. -- Regards, Maksym Planeta
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users