On 4/15/25 2:38 AM, Sairaj Kodilkar wrote:
Hi Alejandro,
On 4/15/2025 1:56 AM, Alejandro Jimenez wrote:
Hi Sairaj,
I'm conflicted by the implementation of the change, so I'd like to
make sure I fully understand...
On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:
Fix the issue by removing pt_supported check and disabling nodma memory
region. Adding pt_supported requires additional changes and we will look
into it later.
I see that you are trying to essentially block a guest from enabling
an IOMMU feature that is not currently supported by the vIOMMU.
Hopefully that limitation will be solved soon (shameless plug):
https://lore.kernel.org/qemu-devel/20250414020253.443831-1-
alejandro.j.jime...@oracle.com/
But in the meantime, I think enabling amdvi_dev_as->iommu when DMA
remapping capability is not available is likely to cause more
confusion for anyone trying to understand the already convoluted
details of the memory region setup.
To a reader of the code and the commit message, it is confusing that
to support the "NO DMA" case, the nodma memory region must be
disabled, which is the opposite of what it is meant to do.
I dont think that I understand above statement. What do you mean by "NO
DMA" case here ? is it iommu.passthrough=0 ?
I meant it from the point of view of the vIOMMU configuration (since we
don't control what the guest can request). So in terms of vIOMMU modes
and corresponding Memory Regions that must be enabled to support such
modes, I see it as:
Passthrough(NO DMA) --> MR: iommu_nodma: enabled && iommu: disabled
DMA remap --> MR: iommu: enabled && iommu_nodma: disabled
But I recognize that view/model is probably too rigid for now, although
it should be the "correct state" once we support DMA remapping.
Essentially, I am trying to support the "DMA" case that is
iommu.passthrough=0 for the emulated devices, by reverting the changes> that
introduced the regression.
I understand the goal is to make emulated devs to work in more scenarios.
Because of that view that I mention above, is why I don't think of
c1f46999ef506 ("amd_iommu: Add support for pass though mode") as
introducing a regression, but more of a prerequisite to support both PT
and DMA modes.
If I understand correct -->
The original intent of the flag (in case of Intel) is
1. To turn on the optimization which will use nodma region (dynamically
enabling it) if guest configures the device with passthrough (pt=1)
for given context entry.
This is why I said I am conflicted with the implementation. Your change
always disables the iommu_nodma region, where the default for Linux
guests is to use passthrough mode, which "normally" would result in
iommu_nodma being enabled. I almost suggested on my first reply that you
instead forced x86_iommu->pt_supported = 0 in the AMDVi code, but that
creates a similar type of contradiction.
In short, I understand what you are trying to do, but I think "the
trick" as I called it below should probably be documented.
2. The flag should not enable no_dma region if guest does not configure
device with pt.
Intel driver does this dynamically (for every context entry update while
guest is running). But for AMD this is static and does not change with
the DTE updates, which is causing this regression.
hopefully solved soon:
https://lore.kernel.org/qemu-devel/20250414020253.443831-15-alejandro.j.jime...@oracle.com/
Alejandro
To explain the "trick": this change is always enabling amdvi_dev_as-
>iommu, which is explicitly created as an IOMMU memory region (i.e. a
memory region with mr->is_iommu == true), and it is meant to support
DMA remapping. It is relying on the "side effect" that VFIO will try
to register notifiers for memory regions that are an "IOMMU" (i.e.
pass the check in memory_region_is_iommu()), and later fail when
trying to register the notifier.
If this change is merged, I think you should at least include the
explanation above in the commit message, since it is not obvious to me
at first reading. That being said, in my opinion, this approach adds
potential confusion that is not worth the trouble, since most guests
will not be using AMD vIOMMU at this point. And if they are, they
would also have to be specifically requesting to enable DMA
translation to hit the problem. Unfortunately, guests will always have
the ability of specifying an invalid configuration if they try really
hard (or not hard at all in this case).
Yep, I should have explained it in details. Sorry about the confusion
will keep in mind while sending future patches.
Regards
Sairaj
Alejandro