Hi Nicolin,
On 3/19/25 6:14 PM, Nicolin Chen wrote: > On Wed, Mar 19, 2025 at 05:45:51PM +0100, Eric Auger wrote: >> >> >> On 3/17/25 8:10 PM, Nicolin Chen wrote: >>> On Mon, Mar 17, 2025 at 07:07:52PM +0100, Eric Auger wrote: >>>> On 3/17/25 6:54 PM, Nicolin Chen wrote: >>>>> On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote: >>>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote: >>>>>>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel >>>>>>> device. In order to support vfio-pci dev assignment with a Guest >>>>>> guest >>>>>>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2) >>>>>> nested (s1+s2) >>>>>>> mode, with Guest owning the S1 page tables. Subsequent patches will >>>>>> the guest >>>>>>> add support for smmuv3-accel to provide this. >>>>>> Can't this -accel smmu also works with emulated devices? Do we want an >>>>>> exclusive usage? >>>>> Is there any benefit from emulated devices working in the HW- >>>>> accelerated nested translation mode? >>>> Not really but do we have any justification for using different device >>>> name in accel mode? I am not even sure that accel option is really >>>> needed. Ideally the qemu device should be able to detect it is >>>> protecting a VFIO device, in which case it shall check whether nested is >>>> supported by host SMMU and then automatically turn accel mode? >>>> >>>> I gave the example of the vfio device which has different class >>>> implementration depending on the iommufd option being set or not. >>> Do you mean that we should just create a regular smmuv3 device and >>> let a VFIO device to turn on this smmuv3's accel mode depending on >>> its LEGACY/IOMMUFD class? >> no this is not what I meant. I gave an example where depending on an >> option passed to thye VFIO device you choose one class implement or the >> other. > Option means something like this: > -device smmuv3,accel=on > instead of > -device "smmuv3-accel" > ? > > Yea, I think that's good. Yeah actually that's a big debate for not much. From an implementation pov that shall not change much. The only doubt I have is if we need to conditionnaly expose the MSI RESV regions it is easier to do if we detect we have a smmuv3-accel. what the option allows is the auto mode. > >>> Another question: how does an emulated device work with a vSMMUv3? >> I don't get your question. vSMMUv3 currently only works with emulated >> devices. Did you mean accelerated SMMUv3? > Yea. If "accel=on", how does an emulated device work with that? > >>> I could imagine that all the accel steps would be bypassed since >>> !sdev->idev. Yet, the emulated iotlb should cache its translation >>> so we will need to flush the iotlb, which will increase complexity >>> as the TLBI command dispatching function will need to be aware what >>> ASID is for emulated device and what is for vfio device.. >> I don't get the issue. For emulated device you go through the usual >> translate path which indeed caches configs and translations. In case the >> guest invalidates something, you know the SID and you find the entries >> in the cache that are tagged by this SID. >> >> In case you have an accelerated device (indeed if sdev->idev) you don't >> exercise that path. On invalidation you detect the SID matches a VFIO >> devoce, propagate the invalidations to the host instead. on the >> invalidation you should be able to detect pretty easily if you need to >> flush the emulated caches or propagate the invalidations. Do I miss some >> extra problematic? >> >> I do not say we should support emulated devices and VFIO devices in the >> same guest iommu group. But I don't see why we couldn't easily plug the >> accelerated logic in the current logical for emulation/vhost and do not >> require a different qemu device. > Hmm, feels like I fundamentally misunderstood your point. > a) We implement the device model with the same piece of code but > only provide an option "accel=on/off" to switch mode. And both > passthrough devices and emulated devices can attach to the same > "accel=on" device. I think we all agree we don't want that use case in general. However effectively I was questioning why it couldn't work maybe at the expense of some perf degration. > b) We implement the device model with the same piece of code but > only provide an option "accel=on/off" to switch mode. Then, an > passthrough device can attach to an "accel=on" device, but an > emulated device can only attach to an "accel=off" SMMU device. > > I was thinking that you want case (a). But actually you were just > talking about case (b)? I think (b) is totally fine. > > We certainly can't do case (a): not all TLBI commands gives an "SID" > field (so would have to broadcast, i.e. underlying SMMU HW would run > commands that were supposed for emulated devices only); in case of > vCMDQ, commands for emulated devices would be issued to real HW and I am still confused about that. For instance if the guest sends an NH_ASID, NH_VA invalidation and it happens both the emulated device and VFIO-device share the same cd.asid (same guest iommu domain, which practically should not happen) why shouldn't we propagate the invalidation to the host. Does the problem come from the usage of vCMDQ or would you foresee the same problem with a generic physical SMMU? Thanks Eric > trigger HW errors. > > Thanks > Nicolin >