On 4/15/25 2:38 AM, Sairaj Kodilkar wrote:


Hi Alejandro,

On 4/15/2025 1:56 AM, Alejandro Jimenez wrote:

Hi Sairaj,

I'm conflicted by the implementation of the change, so I'd like to make sure I fully understand...

On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:

Fix the issue by removing pt_supported check and disabling nodma memory
region. Adding pt_supported requires additional changes and we will look
into it later.

I see that you are trying to essentially block a guest from enabling an IOMMU feature that is not currently supported by the vIOMMU. Hopefully that limitation will be solved soon (shameless plug): https://lore.kernel.org/qemu-devel/20250414020253.443831-1- alejandro.j.jime...@oracle.com/

But in the meantime, I think enabling amdvi_dev_as->iommu when DMA remapping capability is not available is likely to cause more confusion for anyone trying to understand the already convoluted details of the memory region setup.

To a reader of the code and the commit message, it is confusing that to support the "NO DMA" case, the nodma memory region must be disabled, which is the opposite of what it is meant to do.


I dont think that I understand above statement. What do you mean by "NO
DMA" case here ? is it iommu.passthrough=0 ?

I meant it from the point of view of the vIOMMU configuration (since we don't control what the guest can request). So in terms of vIOMMU modes and corresponding Memory Regions that must be enabled to support such modes, I see it as:

Passthrough(NO DMA) --> MR: iommu_nodma: enabled && iommu: disabled

DMA remap --> MR: iommu: enabled && iommu_nodma: disabled

But I recognize that view/model is probably too rigid for now, although it should be the "correct state" once we support DMA remapping.


Essentially, I am trying to support the "DMA" case that is
iommu.passthrough=0 for the emulated devices, by reverting the changes> that 
introduced the regression.

I understand the goal is to make emulated devs to work in more scenarios.

Because of that view that I mention above, is why I don't think of c1f46999ef506 ("amd_iommu: Add support for pass though mode") as introducing a regression, but more of a prerequisite to support both PT and DMA modes.


If I understand correct -->
The original intent of the flag (in case of Intel) is

1. To turn on the optimization which will use nodma region (dynamically
    enabling it) if guest configures the device with passthrough (pt=1)
    for given context entry.

This is why I said I am conflicted with the implementation. Your change always disables the iommu_nodma region, where the default for Linux guests is to use passthrough mode, which "normally" would result in iommu_nodma being enabled. I almost suggested on my first reply that you instead forced x86_iommu->pt_supported = 0 in the AMDVi code, but that creates a similar type of contradiction.

In short, I understand what you are trying to do, but I think "the trick" as I called it below should probably be documented.


2. The flag should not enable no_dma region if guest does not configure
    device with pt.

Intel driver does this dynamically (for every context entry update while
guest is running). But for AMD this is static and does not change with
the DTE updates, which is causing this regression.

hopefully solved soon:
https://lore.kernel.org/qemu-devel/20250414020253.443831-15-alejandro.j.jime...@oracle.com/

Alejandro


To explain the "trick": this change is always enabling amdvi_dev_as-  >iommu, which is explicitly created as an IOMMU memory region (i.e. a memory region with mr->is_iommu == true), and it is meant to support DMA remapping. It is relying on the "side effect" that VFIO will try to register notifiers for memory regions that are an "IOMMU" (i.e. pass the check in memory_region_is_iommu()), and later fail when trying to register the notifier.

If this change is merged, I think you should at least include the explanation above in the commit message, since it is not obvious to me at first reading. That being said, in my opinion, this approach adds potential confusion that is not worth the trouble, since most guests will not be using AMD vIOMMU at this point. And if they are, they would also have to be specifically requesting to enable DMA translation to hit the problem. Unfortunately, guests will always have the ability of specifying an invalid configuration if they try really hard (or not hard at all in this case).


Yep, I should have explained it in details. Sorry about the confusion
will keep in mind while sending future patches.

Regards
Sairaj

Alejandro



Reply via email to