On Mon, Jun 16, 2025 at 03:38:26PM +0800, Yi Liu wrote:
> On 2025/6/16 13:59, Nicolin Chen wrote:
> > On Thu, Jun 12, 2025 at 08:53:40PM +0800, Yi Liu wrote:
> > > > > That being said, IOMMU_NOTIFIER_IOTLB_EVENTS should not be needed
> > > > > for passthrough devices, right?
> > > > 
> > > > No, even if x-flts=on is configured in QEMU cmdline, that only mean 
> > > > virtual vtd
> > > > supports stage-1 translation, guest still can choose to run in legacy 
> > > > mode(stage2),
> > > > e.g., with kernel cmdline intel_iommu=on,sm_off
> > > > 
> > > > So before guest run, we don't know which kind of page table either 
> > > > stage1 or stage2
> > > > for this VFIO device by guest. So we have to use iommu AS to catch 
> > > > stage2's MAP event
> > > > if guest choose stage2.
> > > 
> > > @Zheznzhong, if guest decides to use legacy mode then vIOMMU should switch
> > > the MRs of the device's AS, hence the IOAS created by VFIO container would
> > > be switched to using the IOMMU_NOTIFIER_IOTLB_EVENTS since the MR is
> > > switched to IOMMU MR. So it should be able to support shadowing the guest
> > > IO page table. Hence, this should not be a problem.
> > > 
> > > @Nicolin, I think your major point is making the VFIO container IOAS as a
> > > GPA IOAS (always return system AS in get_address_space op) and reusing it
> > > when setting nested translation. Is it? I think it should work if:
> > > 1) we can let the vfio memory listener filter out the RO pages per 
> > > vIOMMU's
> > >     request.
> > 
> > Yes.
> > 
> > > But I don't want the get_address_space op always return system
> > >     AS as the reason mentioned by Zhenzhong above.
> > 
> > So, you mean the VT-d model would need a runtime notification to
> > switch the address space of the VFIO ioas?
> 
> It's not a notification. It's done by switching AS. Detail can be found
> in vtd_switch_address_space().

OK. I got confused about the "switch", thinking that was about
the get_address_space() call.

> > TBH, I am still unclear how many cases the VT-d model would need
> > support here :-/
> >
> > > 2) we can disallow emulated/passthru devices behind the same pcie-pci
> > >     bridge[1]. For emulated devices, AS should switch to iommu MR, while 
> > > for
> > >     passthru devices, it needs the AS stick with the system MR hence be 
> > > able
> > >     to keep the VFIO container IOAS as a GPA IOAS. To support this, let AS
> > >     switch to iommu MR and have a separate GPA IOAS is needed. This 
> > > separate
> > >     GPA IOAS can be shared by all the passthru devices.
> > 
> > Yea, ARM is doing in a similar way.
> > 
> > > So basically, we are ok with your idea. But we should decide if it is
> > > necessary to support the topology in 2). I think this is a general
> > > question. TBH. I don't have much information to judge if it is valuable.
> > > Perhaps, let's hear from more people.
> > 
> > I would be okay if VT-d decides to move on with its own listener,
> > if it turns out to be the relatively better case. But for ARM, I'd
> > like to see we can reuse the VFIO container IOAS.
> 
> I didn't see a problem so far on this part. Have you seen any?

Probably no functional problem with that internal listener. ARM
could work using one like that as well. The only problem is code
duplication. It's not ideal for everybody to have an internal S2
listener while wasting the VFIO one.

But given that VT-d has more complicated use cases like runtime
guest-level configuration that switches between nesting and non-
nesting modes, perhaps having an internal listener is a better
idea?

Thanks
Nicolin

Reply via email to