amd_iommu: Fix device setup failure when PT is on.

Alejandro Jimenez Mon, 14 Apr 2025 13:28:04 -0700

Hi Sairaj,

I'm conflicted by the implementation of the change, so I'd like to makesure I fully understand...


On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:

Current amd_iommu enables the iommu_nodma address space when pt_supported

flag is on.


As it should, that is the intended purpose of the iommu_nodma memory region.

This causes device to bypass the IOMMU and use untranslated

address to perform DMA when guest kernel uses DMA mode, resulting in
failure to setup the devices in the guest.

So the scenario you are describing above is this QEMU cmdline (usingexplicit options):


-device amd-iommu,intremap=on,xtsup=on,pt=on \
-device vfio-pci,host=0000:a1:00.1...

and guest forcing DMA remap mode e.g. 'iommu.passthrough=0'

which will cause failures from QEMU:

qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command listbuffer addressqemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FISreceive buffer addressqemu-system-x86_64: AHCI: Failed to start DMA engine: bad command listbuffer addressqemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FISreceive buffer addressqemu-system-x86_64: AHCI: Failed to start DMA engine: bad command listbuffer address


and also errors from the VF driver on the guest. e.g.:

[ 69.424937] mlx5_core 0000:00:03.0: mlx5_function_enable:1212:(pid2492): enable hca failed[ 69.437048] mlx5_core 0000:00:03.0: probe_one:1994:(pid 2492):mlx5_init_one failed with error code -110[ 69.444913] mlx5_core 0000:00:03.0: probe with driver mlx5_corefailed with error -110

Whereas after your change the guest would fail to launch because VFIOwill try to register a MAP notifier for the device and fail the check in

amdvi_iommu_notify_flag_changed().

If my above description is correct, then...


Fix the issue by removing pt_supported check and disabling nodma memory
region. Adding pt_supported requires additional changes and we will look
into it later.

I see that you are trying to essentially block a guest from enabling anIOMMU feature that is not currently supported by the vIOMMU. Hopefullythat limitation will be solved soon (shameless plug):

https://lore.kernel.org/qemu-devel/20250414020253.443831-1-alejandro.j.jime...@oracle.com/

But in the meantime, I think enabling amdvi_dev_as->iommu when DMAremapping capability is not available is likely to cause more confusionfor anyone trying to understand the already convoluted details of thememory region setup. To a reader of the code and the commit message, itis confusing that to support the "NO DMA" case, the nodma memory regionmust be disabled, which is the opposite of what it is meant to do.

To explain the "trick": this change is always enablingamdvi_dev_as->iommu, which is explicitly created as an IOMMU memoryregion (i.e. a memory region with mr->is_iommu == true), and it is meantto support DMA remapping. It is relying on the "side effect" that VFIOwill try to register notifiers for memory regions that are an "IOMMU"(i.e. pass the check in memory_region_is_iommu()), and later fail whentrying to register the notifier.

If this change is merged, I think you should at least include theexplanation above in the commit message, since it is not obvious to meat first reading. That being said, in my opinion, this approach addspotential confusion that is not worth the trouble, since most guestswill not be using AMD vIOMMU at this point. And if they are, they wouldalso have to be specifically requesting to enable DMA translation to hitthe problem. Unfortunately, guests will always have the ability ofspecifying an invalid configuration if they try really hard (or not hardat all in this case).


Alejandro


Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode")
Signed-off-by: Sairaj Kodilkar <sarun...@amd.com>
---
  hw/i386/amd_iommu.c | 12 ++----------
  1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 5f9b95279997..df8ba5d39ada 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1426,7 +1426,6 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
      AMDVIState *s = opaque;
      AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
      int bus_num = pci_bus_num(bus);
-    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);

iommu_as = s->address_spaces[bus_num];@@ -1486,15 +1485,8 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)

                                              AMDVI_INT_ADDR_FIRST,
                                              &amdvi_dev_as->iommu_ir, 1);

- if (!x86_iommu->pt_supported) {

-            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
-            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
-                                      true);
-        } else {
-            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
-                                      false);
-            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, true);
-        }
+        memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
+        memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu), true);
      }
      return &iommu_as[devfn]->as;
  }

Re: [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.

Reply via email to