Hey Nic,
On 2025/5/22 06:49, Nicolin Chen wrote:
On Wed, May 21, 2025 at 07:14:45PM +0800, Zhenzhong Duan wrote:
+static const MemoryListener iommufd_s2domain_memory_listener = {
+ .name = "iommufd_s2domain",
+ .priority = 1000,
+ .region_add = iommufd_listener_region_add_s2domain,
+ .region_del = iommufd_listener_region_del_s2domain,
+};
Would you mind elaborating When and how vtd does all S2 mappings?
On ARM, the default vfio_memory_listener could capture the entire
guest RAM and add to the address space. So what we do is basically
reusing the vfio_memory_listener:
https://lore.kernel.org/qemu-devel/20250311141045.66620-13-shameerali.kolothum.th...@huawei.com/
in concept yes, all the guest ram. but due to an errata, we need
to skip the RO mappings.
The thing is that when a VFIO device is attached to the container
upon a nesting configuration, the ->get_address_space op should
return the system address space as S1 nested HWPT isn't allocated
yet. Then all the iommu as routines in vfio_listener_region_add()
would be skipped, ending up with mapping the guest RAM in S2 HWPT
correctly. Not until the S1 nested HWPT is allocated by the guest
OS (after guest boots), can the ->get_address_space op return the
iommu address space.
This seems a bit different between ARM and VT-d emulation. The VT-d
emulation code returns the iommu address space regardless of what
translation mode guest configured. But the MR of the address space
has two overlapped subregions, one is nodmar, another one is iommu.
As the naming shows, the nodmar is aliased to the system MR. And before
the guest enables iommu and set PGTT to a non-PT mode (e.g. S1 or S2),
the effective MR alias is the nodmar, hence the mapping this address
space holds are the GPA mappings in the beginning. If guest set PGTT to S2,
then the iommu MR is enabled, hence the mapping is gIOVA mappings
accordingly. So in VT-d emulation, the address space switch is more the MR
alias switching.
In this series, we mainly want to support S1 translation type for guest.
And it is based on nested translation, which needs a S2 domain that holds
the GPA mappings. Besides S1 translation type, PT is also supported. Both
the two types need a S2 domain which already holds GPA mappings. So we have
this internal listener. Also, we want to skip RO mappings on S2, so that's
another reason for it. @Zhenzhong, perhaps, it can be described in the
commit message why an internal listener is introduced.
With this address space shift, S2 mappings can be simply captured
and done by vfio_memory_listener. Then, such an s2domain listener
would be largely redundant.
hope above addressed your question.
So the second question is:
Does vtd have to own this iommufd_s2domain_memory_listener? IOW,
yes based on the current design. when guest GPTT==PT, attach device
to S2 hwpt, when it goes to S1, then attach it to a S1 hwpt whose
parent is the aforementioned S2 hwpt. This S2 hwpt is always there
for use.
does vtd_host_dma_iommu() have to return the iommu address space
all the time?
yes, all the time.
--
Regards,
Yi Liu