On 3/18/25 3:13 PM, Nicolin Chen wrote:
On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
On 3/17/25 9:19 PM, Nicolin Chen wrote:
On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
Another question: how does an emulated device work with a vSMMUv3?
I could imagine that all the accel steps would be bypassed since
!sdev->idev. Yet, the emulated iotlb should cache its translation
so we will need to flush the iotlb, which will increase complexity
as the TLBI command dispatching function will need to be aware what
ASID is for emulated device and what is for vfio device..
I think you should block it. We already expect different vSMMU's
depending on the physical SMMU under the PCI device, it makes sense
that a SW VFIO device would have it's own, non-accelerated, vSMMU
model in the guest.
Yea, I agree and it'd be cleaner for an implementation separating
them.

In my mind, the general idea of "accel=on" is also to keep things
in a more efficient way: passthrough devices go to HW-accelerated
vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
bypassed (PCIE0).

Originally a specific SMMU device was needed to opt in for MSI reserved
region ACPI IORT description which are not needed if you don't rely on
S1+S2. However if we don't rely on this trick this was not even needed
with legacy integration
(https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.au...@redhat.com/).

Nevertheless I don't think anything prevents the acceleration granted
device from also working with virtio/vhost devices for instance unless
you unplug the existing infra. The translation and invalidation just
should use different control paths (explicit translation requests,
invalidation notifications towards vhost, ...).

smmuv3_translate() is per sdev, so it's easy.

Invalidation is done via commands, which could be tricky:
a) Broadcast command
b) ASID validation -- we'll need to keep track of a list of ASIDs
    for vfio device to compare the ASID in each per-ASID command,
    potentially by trapping all CFGI_CD(_ALL) commands? Note that
    each vfio device may have multiple ASIDs (for multiple CDs).
Either a or b above will have some validation efficiency impact.

Again, what does legitimate to have different qemu devices for the same
IP? I understand that it simplifies the implementation but I am not sure
this is a good reason. Nevertheless it worth challenging. What is the
plan for intel iommu? Will we have 2 devices, the legacy device and one
for nested?

Hmm, it seems that there are two different topics:
1. Use one SMMU device model (source code file; "iommu=" string)
    for both an emulated vSMMU and an HW-accelerated vSMMU.
2. Allow one vSMMU instance to work with both an emulated device
    and a passthrough device.
And I get that you want both 1 and 2.

I'm totally okay with 1, yet see no compelling benefit from 2 for
the increased complexity in the invalidation routine.

And another question about the mixed device attachment. Let's say
we have in the host:
   VFIO passthrough dev0 -> pSMMU0
   VFIO passthrough dev1 -> pSMMU1
Should we allow emulated devices to be flexibly plugged?
   dev0 -> vSMMU0 /* Hard requirement */
   dev1 -> vSMMU1 /* Hard requirement */
   emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
   emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */

Thanks
Nicolin

I agree w/Jason & Nicolin: different vSMMUs for pass-through devices than emulated, 
& vice-versa.
Not mixing... because... of the next agreement:

I agree with Eric that 'accel' isn't needed -- this should be ascertained from 
the pSMMU that a physical device is attached to.
Now... how does vfio(?; why not qemu?) layer determine that? -- where are 
SMMUv3 'accel' features exposed either: a) in the device struct (for the 
smmuv3) or (b) somewhere under sysfs? ... I couldn't find anything under either 
on my g-h system, but would appreciate a ptr if there is.
and like Eric, although 'accel' is better than the original 'nested', it's 
non-obvious what accel feature(s) are being turned on, or not.
In fact, if broken accel hw occurs ('if' -> 'when'), how should it be turned 
off? ... if info in the kernel, a kernel boot-param will be needed;
if in sysfs, a write to 0 an enable(disable) it maybe an alternative as well.
Bottom line: we need a way to (a) ascertain the accel feature (b) a way to 
disable it when it is broken,
so qemu's smmuv3 spec will 'just work'.
[This may also help when migrating from a machine that has accel working to one 
that does not.[

... and when an emulated device is assigned a vSMMU, there are no accel 
features ... unless we have tunables like batch iotlb invalidation for perf 
reasons, which can be viewed as an 'accel' option.


Reply via email to