On Tue, Jan 27, 2026 at 07:08:39PM +1100, Alexey Kardashevskiy wrote: > > Oh so it doesn't actually check the RMP, it is just rounding down to > > two fixed sizes? > > No, it does check RMP. > > If the IOMMU page walk ends at a >=2MB page - it will round down to > 2MB (to the nearest supported RMP size) and check for 2MB RMP and if > that check fails because of the page size - it won't try 4K (even > though it could theoretically). > > The expectation is that the host OS makes sure the IOMMU uses page > sizes equal or bigger than closest smaller RMP page size so there is > no need in two RMP checks.
Seems dynfunctional to me. > > > > ARM is pushing a thing where encrypt/decrypt has to work on certain > > > > aligned > > > > granual sizes > PAGE_SIZE, you could use that mechanism to select a 2M > > > > size for AMD too and avoid this. > > > > > > 2M minimum on every DMA map? > > On every swiotlb allocation pool chunk, yeah. > > Nah, it is quite easy to force 2MB on swiotlb (just do it once and > forget) but currently any guest page can be converted to shared and > DMA-mapped and this skips swiotlb. Upstream Linux doesn't support that, only SWIOTLB or special DMA coherent memory can be DMA mapped in CC systems. You can't take a random page, make it shared and then DMA map it. > > > > What happens if the guest puts 4K pages into it's AMDv2 table and RMP > > > > is 2M? > > > > > > Is this AMDv2 - an NPT (then it is going to fail)? or nested IOMMU (never > > > tried, in the works, I suspect failure)? > > > > Yes, some future nested vIOMMU > > > > If guest can't have a 4K page in it's vIOMMU while the host is using > > 2M RMP then the whole architecture is broken, sorry. > > I re-read what I wrote and I think I was wrong, the S2 table (guest > physical -> host physical) has to match RMP, not the S1. Really? So the HW can fix the 4k/2M mismatch for the S1 but doesn't bother for the S2? Seems like a crazy design to me. What happens if you don't have a VIOMMU, have a single translation stage and only use the S1 (AMDv2) page table in the hypervisor? Then does the HW fix it? Or does it only fix it with two stages enabled? > > iommufd won't deal with memory maps for IO, the secure world will > > handle that through KVM. > > Is QEMU going to skip on IOMMU mapping entirely? So when the device > is transitioned from untrusted (when everything mapped via VFIO or > IOMMU) to trusted - QEMU will unmap everything and then the guest > will map everything but this time via KVM and bypassing QEMU > entirely? Thanks, On ARM there are different S2s for the IOMMU, one for T=1 and one for T=0 traffic. The T=1 is fully controlled by the secure world is equal to the CPU S2. The T=0 one is fully controlled by qemu and acts like a normal system. The T=0 can only access guest shared memory. Jason
