Hi Sairaj,
I'm conflicted by the implementation of the change, so I'd like to make
sure I fully understand...
On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:
Current amd_iommu enables the iommu_nodma address space when pt_supported
flag is on.
As it should, that is the intended purpose of the iommu_nodma memory region.
This causes device to bypass the IOMMU and use untranslated
address to perform DMA when guest kernel uses DMA mode, resulting in
failure to setup the devices in the guest.
So the scenario you are describing above is this QEMU cmdline (using
explicit options):
-device amd-iommu,intremap=on,xtsup=on,pt=on \
-device vfio-pci,host=0000:a1:00.1...
and guest forcing DMA remap mode e.g. 'iommu.passthrough=0'
which will cause failures from QEMU:
qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list
buffer address
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
receive buffer address
qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list
buffer address
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
receive buffer address
qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list
buffer address
and also errors from the VF driver on the guest. e.g.:
[ 69.424937] mlx5_core 0000:00:03.0: mlx5_function_enable:1212:(pid
2492): enable hca failed
[ 69.437048] mlx5_core 0000:00:03.0: probe_one:1994:(pid 2492):
mlx5_init_one failed with error code -110
[ 69.444913] mlx5_core 0000:00:03.0: probe with driver mlx5_core
failed with error -110
Whereas after your change the guest would fail to launch because VFIO
will try to register a MAP notifier for the device and fail the check in
amdvi_iommu_notify_flag_changed().
If my above description is correct, then...
Fix the issue by removing pt_supported check and disabling nodma memory
region. Adding pt_supported requires additional changes and we will look
into it later.
I see that you are trying to essentially block a guest from enabling an
IOMMU feature that is not currently supported by the vIOMMU. Hopefully
that limitation will be solved soon (shameless plug):
https://lore.kernel.org/qemu-devel/20250414020253.443831-1-alejandro.j.jime...@oracle.com/
But in the meantime, I think enabling amdvi_dev_as->iommu when DMA
remapping capability is not available is likely to cause more confusion
for anyone trying to understand the already convoluted details of the
memory region setup. To a reader of the code and the commit message, it
is confusing that to support the "NO DMA" case, the nodma memory region
must be disabled, which is the opposite of what it is meant to do.
To explain the "trick": this change is always enabling
amdvi_dev_as->iommu, which is explicitly created as an IOMMU memory
region (i.e. a memory region with mr->is_iommu == true), and it is meant
to support DMA remapping. It is relying on the "side effect" that VFIO
will try to register notifiers for memory regions that are an "IOMMU"
(i.e. pass the check in memory_region_is_iommu()), and later fail when
trying to register the notifier.
If this change is merged, I think you should at least include the
explanation above in the commit message, since it is not obvious to me
at first reading. That being said, in my opinion, this approach adds
potential confusion that is not worth the trouble, since most guests
will not be using AMD vIOMMU at this point. And if they are, they would
also have to be specifically requesting to enable DMA translation to hit
the problem. Unfortunately, guests will always have the ability of
specifying an invalid configuration if they try really hard (or not hard
at all in this case).
Alejandro
Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode")
Signed-off-by: Sairaj Kodilkar <sarun...@amd.com>
---
hw/i386/amd_iommu.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 5f9b95279997..df8ba5d39ada 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1426,7 +1426,6 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus,
void *opaque, int devfn)
AMDVIState *s = opaque;
AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
int bus_num = pci_bus_num(bus);
- X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
iommu_as = s->address_spaces[bus_num];
@@ -1486,15 +1485,8 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
AMDVI_INT_ADDR_FIRST,
&amdvi_dev_as->iommu_ir, 1);
- if (!x86_iommu->pt_supported) {
- memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
- memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
- true);
- } else {
- memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
- false);
- memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, true);
- }
+ memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
+ memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu), true);
}
return &iommu_as[devfn]->as;
}