On 24/02/2023 11:25, Joao Martins wrote: > On 23/02/2023 23:26, Jason Gunthorpe wrote: >> On Thu, Feb 23, 2023 at 03:33:09PM -0700, Alex Williamson wrote: >>> On Thu, 23 Feb 2023 16:55:54 -0400 >>> Jason Gunthorpe <j...@nvidia.com> wrote: >>>> On Thu, Feb 23, 2023 at 01:06:33PM -0700, Alex Williamson wrote: >>>> Or even better figure out how to get interrupt remapping without IOMMU >>>> support :\ >>> >>> -machine q35,default_bus_bypass_iommu=on,kernel-irqchip=split \ >>> -device intel-iommu,caching-mode=on,intremap=on >> >> Joao? >> >> If this works lets just block migration if the vIOMMU is turned on.. > > At a first glance, this looked like my regular iommu incantation. > > But reading the code this ::bypass_iommu (new to me) apparently tells that > vIOMMU is bypassed or not for the PCI devices all the way to avoiding > enumerating in the IVRS/DMAR ACPI tables. And I see VFIO double-checks whether > PCI device is within the IOMMU address space (or bypassed) prior to DMA maps > and > such. > > You can see from the other email that all of the other options in my head were > either bit inconvenient or risky. I wasn't aware of this option for what is > worth -- much simpler, should work! >
I say *should*, but on a second thought interrupt remapping may still be required to one of these devices that are IOMMU-bypassed. Say to put affinities to vcpus above 255? I was trying this out with more than 255 vcpus with a couple VFs and at a first glance these VFs fail to probe (these are CX6 VFs). It is a working setup without the parameter, but now adding a default_bus_bypass_iommu=on fails to init VFs: [ 32.412733] mlx5_core 0000:00:02.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 32.416242] mlx5_core 0000:00:02.0: mlx5_load:1204:(pid 3361): Failed to alloc IRQs [ 33.227852] mlx5_core 0000:00:02.0: probe_one:1684:(pid 3361): mlx5_init_one failed with error code -19 [ 33.242182] mlx5_core 0000:00:03.0: firmware version: 22.31.1660 [ 33.415876] mlx5_core 0000:00:03.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 33.448016] mlx5_core 0000:00:03.0: mlx5_load:1204:(pid 3361): Failed to alloc IRQs [ 34.207532] mlx5_core 0000:00:03.0: probe_one:1684:(pid 3361): mlx5_init_one failed with error code -19 I haven't dived yet into why it fails. > And avoiding vIOMMU simplifies the whole patchset too, if it's OK to add a > live > migration blocker if `bypass_iommu` is off for any PCI device. > Still we could have for starters a live migration blocker until we revisit the vIOMMU case ... should we deem that the default_bus_bypass_iommu=on or the others I suggested as non-options?