unbind guest page table to host

Yi Liu Mon, 16 Jun 2025 23:44:50 -0700

On 2025/6/17 11:22, Nicolin Chen wrote:

On Mon, Jun 16, 2025 at 03:38:26PM +0800, Yi Liu wrote:

On 2025/6/16 13:59, Nicolin Chen wrote:

On Thu, Jun 12, 2025 at 08:53:40PM +0800, Yi Liu wrote:

That being said, IOMMU_NOTIFIER_IOTLB_EVENTS should not be needed
for passthrough devices, right?


No, even if x-flts=on is configured in QEMU cmdline, that only mean virtual vtd
supports stage-1 translation, guest still can choose to run in legacy 
mode(stage2),
e.g., with kernel cmdline intel_iommu=on,sm_off

So before guest run, we don't know which kind of page table either stage1 or 
stage2
for this VFIO device by guest. So we have to use iommu AS to catch stage2's MAP 
event
if guest choose stage2.


@Zheznzhong, if guest decides to use legacy mode then vIOMMU should switch
the MRs of the device's AS, hence the IOAS created by VFIO container would
be switched to using the IOMMU_NOTIFIER_IOTLB_EVENTS since the MR is
switched to IOMMU MR. So it should be able to support shadowing the guest
IO page table. Hence, this should not be a problem.

@Nicolin, I think your major point is making the VFIO container IOAS as a
GPA IOAS (always return system AS in get_address_space op) and reusing it
when setting nested translation. Is it? I think it should work if:
1) we can let the vfio memory listener filter out the RO pages per vIOMMU's
     request.


Yes.

But I don't want the get_address_space op always return system
     AS as the reason mentioned by Zhenzhong above.


So, you mean the VT-d model would need a runtime notification to
switch the address space of the VFIO ioas?


It's not a notification. It's done by switching AS. Detail can be found
in vtd_switch_address_space().


OK. I got confused about the "switch", thinking that was about
the get_address_space() call.


yeah, not that call. The all magic is the MR enable/disable. This will
switch to iommu MR hence the vfio_listener_region_add() will see the
MR is iommu MR and register iommu notifier.

TBH, I am still unclear how many cases the VT-d model would need
support here :-/

2) we can disallow emulated/passthru devices behind the same pcie-pci
     bridge[1]. For emulated devices, AS should switch to iommu MR, while for
     passthru devices, it needs the AS stick with the system MR hence be able
     to keep the VFIO container IOAS as a GPA IOAS. To support this, let AS
     switch to iommu MR and have a separate GPA IOAS is needed. This separate
     GPA IOAS can be shared by all the passthru devices.


Yea, ARM is doing in a similar way.

So basically, we are ok with your idea. But we should decide if it is
necessary to support the topology in 2). I think this is a general
question. TBH. I don't have much information to judge if it is valuable.
Perhaps, let's hear from more people.


I would be okay if VT-d decides to move on with its own listener,
if it turns out to be the relatively better case. But for ARM, I'd
like to see we can reuse the VFIO container IOAS.


I didn't see a problem so far on this part. Have you seen any?


Probably no functional problem with that internal listener. ARM
could work using one like that as well. The only problem is code
duplication. It's not ideal for everybody to have an internal S2
listener while wasting the VFIO one.

But given that VT-d has more complicated use cases like runtime
guest-level configuration that switches between nesting and non-
nesting modes, perhaps having an internal listener is a better
idea?


I noticed there is quite a few duplication now (container/ioas/hwpt). let's
see if anyone wants to put the emulated device and passthru devices under
the same pci bridge. If no, let's avoid duplicating code.

--
Regards,
Yi Liu

Re: [PATCH rfcv3 15/21] intel_iommu: Bind/unbind guest page table to host

Reply via email to