On 3/18/25 10:22 PM, Donald Dutile wrote:
>
>
> On 3/18/25 3:13 PM, Nicolin Chen wrote:
>> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
>>> On 3/17/25 9:19 PM, Nicolin Chen wrote:
>>>> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
>>>>> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>>>>>> Another question: how does an emulated device work with a vSMMUv3?
>>>>>> I could imagine that all the accel steps would be bypassed since
>>>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>>>> so we will need to flush the iotlb, which will increase complexity
>>>>>> as the TLBI command dispatching function will need to be aware what
>>>>>> ASID is for emulated device and what is for vfio device..
>>>>> I think you should block it. We already expect different vSMMU's
>>>>> depending on the physical SMMU under the PCI device, it makes sense
>>>>> that a SW VFIO device would have it's own, non-accelerated, vSMMU
>>>>> model in the guest.
>>>> Yea, I agree and it'd be cleaner for an implementation separating
>>>> them.
>>>>
>>>> In my mind, the general idea of "accel=on" is also to keep things
>>>> in a more efficient way: passthrough devices go to HW-accelerated
>>>> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
>>>> bypassed (PCIE0).
>>
>>> Originally a specific SMMU device was needed to opt in for MSI reserved
>>> region ACPI IORT description which are not needed if you don't rely on
>>> S1+S2. However if we don't rely on this trick this was not even needed
>>> with legacy integration
>>> (https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.au...@redhat.com/).
>>>
>>>
>>> Nevertheless I don't think anything prevents the acceleration granted
>>> device from also working with virtio/vhost devices for instance unless
>>> you unplug the existing infra. The translation and invalidation just
>>> should use different control paths (explicit translation requests,
>>> invalidation notifications towards vhost, ...).
>>
>> smmuv3_translate() is per sdev, so it's easy.
>>
>> Invalidation is done via commands, which could be tricky:
>> a) Broadcast command
>> b) ASID validation -- we'll need to keep track of a list of ASIDs
>>     for vfio device to compare the ASID in each per-ASID command,
>>     potentially by trapping all CFGI_CD(_ALL) commands? Note that
>>     each vfio device may have multiple ASIDs (for multiple CDs).
>> Either a or b above will have some validation efficiency impact.
>>
>>> Again, what does legitimate to have different qemu devices for the same
>>> IP? I understand that it simplifies the implementation but I am not
>>> sure
>>> this is a good reason. Nevertheless it worth challenging. What is the
>>> plan for intel iommu? Will we have 2 devices, the legacy device and one
>>> for nested?
>>
>> Hmm, it seems that there are two different topics:
>> 1. Use one SMMU device model (source code file; "iommu=" string)
>>     for both an emulated vSMMU and an HW-accelerated vSMMU.
>> 2. Allow one vSMMU instance to work with both an emulated device
>>     and a passthrough device.
>> And I get that you want both 1 and 2.
>>
>> I'm totally okay with 1, yet see no compelling benefit from 2 for
>> the increased complexity in the invalidation routine.
>>
>> And another question about the mixed device attachment. Let's say
>> we have in the host:
>>    VFIO passthrough dev0 -> pSMMU0
>>    VFIO passthrough dev1 -> pSMMU1
>> Should we allow emulated devices to be flexibly plugged?
>>    dev0 -> vSMMU0 /* Hard requirement */
>>    dev1 -> vSMMU1 /* Hard requirement */
>>    emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
>>    emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */
>>
>> Thanks
>> Nicolin
>>
> I agree w/Jason & Nicolin: different vSMMUs for pass-through devices
> than emulated, & vice-versa.
> Not mixing... because... of the next agreement:
you need to clarify what you mean by different vSMMUs: are you taking
about different instances or different qemu device types?
>
> I agree with Eric that 'accel' isn't needed -- this should be
> ascertained from the pSMMU that a physical device is attached to.
we can simply use an AUTO_ON_OFF property and by default choose AUTO
value. That would close the debate ;-)

Eric
> Now... how does vfio(?; why not qemu?) layer determine that? -- where
> are SMMUv3 'accel' features exposed either: a) in the device struct
> (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't find
> anything under either on my g-h system, but would appreciate a ptr if
> there is.
> and like Eric, although 'accel' is better than the original 'nested',
> it's non-obvious what accel feature(s) are being turned on, or not.
> In fact, if broken accel hw occurs ('if' -> 'when'), how should it be
> turned off? ... if info in the kernel, a kernel boot-param will be
> needed;
> if in sysfs, a write to 0 an enable(disable) it maybe an alternative
> as well.
> Bottom line: we need a way to (a) ascertain the accel feature (b) a
> way to disable it when it is broken,
> so qemu's smmuv3 spec will 'just work'.
> [This may also help when migrating from a machine that has accel
> working to one that does not.[
>
> ... and when an emulated device is assigned a vSMMU, there are no
> accel features ... unless we have tunables like batch iotlb
> invalidation for perf reasons, which can be viewed as an 'accel' option.
>


Reply via email to