On 29/1/26 00:32, Jason Gunthorpe wrote:
On Wed, Jan 28, 2026 at 12:42:08PM +1100, Alexey Kardashevskiy wrote:
Nah, it is quite easy to force 2MB on swiotlb (just do it once and
forget) but currently any guest page can be converted to shared and
DMA-mapped and this skips swiotlb.

Upstream Linux doesn't support that, only SWIOTLB or special DMA
coherent memory can be DMA mapped in CC systems. You can't take a
random page, make it shared and then DMA map it.

Well, my test device driver calls dma_alloc_coherent() which does that - alloc 
+ convert 4K.

Yes, and there is no reason that can't come from the same allocator as
SWIOTLB and use 2M aligned blocks.

I am rather surprised that even now, with SWIOTLB_FORCE, dma_alloc_coherent 
chooses not to use SWIOTLB in confidential VM.

What happens if you don't have a VIOMMU, have a single translation
stage and only use the S1 (AMDv2) page table in the hypervisor? Then
does the HW fix it? Or does it only fix it with two stages enabled?

The HW translates a DMA handle to a host pfn, and then RMP checks if
that [pfn..pfn+size] is assigned to the correct ASID and the page
size matches and the gfn matches.

RMP does not check S1 translations inside the guest, only S2. RMP is
not fixing page sizes or anything, it says yes/no to the access.

Your explanation doesn't make alot of sense.

If we have a vIOMMU and the guest has a 4K IOPTE in S1 then it goes

  S1[4k] -> S2[2M] -- [4k] --> RMP[2M] ==> OK 4k IOTLB entry

Should be 2MB IOTLB.

While if we have no vIOMMU, the same effective scenario:

  S2[4k] ------- [4k] -------> RMP[2M] ==> FAIL

The host should have made sure S2 and RMP use the same page size.

It makes no sense at all. Why build something like that?

It is not a "firewall" it is a huge software obstacle.

Maybe your answer is the entity that is building the RMP also has to
build a matching S2 IOTLB as one unit and

Yes, the host OS updates both RMP and S2, and the host uses the same page size. Because 
when the guest accepts memory/MMIO ("validates" in AMD words, it prevents the 
host from changing it quietly), it accepts a page of a specific size so then the guest 
can be sure that that S2 mapping won't be remapped by the (untrusted) host.

we somehow just plumb the
page table pointer and invalidations into the IOMMU driver.

Such a messy design.

Not sure about that, I dislike other designs more. At least with this one S2 
tables (IOMMU, NPT) stay the same vs having firmwares dealing with them with 
KVM having to manage some of it. I also suspect I am explaining RMP rather 
poorly (which is a control mechanism, not for translation). May be Vasant could 
help :) Thanks,


iommufd won't deal with memory maps for IO, the secure world will
handle that through KVM.

Is QEMU going to skip on IOMMU mapping entirely? So when the device
is transitioned from untrusted (when everything mapped via VFIO or
IOMMU) to trusted - QEMU will unmap everything and then the guest
will map everything but this time via KVM and bypassing QEMU
entirely? Thanks,

On ARM there are different S2s for the IOMMU, one for T=1 and one for
T=0 traffic. The T=1 is fully controlled by the secure world is equal
to the CPU S2. The T=0 one is fully controlled by qemu and acts like a
normal system. The T=0 can only access guest shared memory.

Does the T=0 table still have all the guest memory mapped (with the
expectation that what is not allowed - won't be accessed using that
table)? Thanks,

I'm not sure what the plan is, I think ARM can do both ways - map all
guest physical and rely on the GPT to prevent access or dynamically
map only shared pages.

Jason

--
Alexey


Reply via email to