Hi Will, On 23/08/2017 18:42, Will Deacon wrote: > Hi Eric, > > On Wed, Aug 23, 2017 at 02:36:53PM +0200, Auger Eric wrote: >> On 23/08/2017 12:25, Will Deacon wrote: >>> On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote: >>>> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote: >>>>> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote: >>>>>> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote: >>>>>>> When running a virtual SMMU on a guest we sometimes need to trap >>>>>>> all changes to the translation structures. This is especially useful >>>>>>> to integrate with VFIO. This patch adds a new option that forces >>>>>>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables. >>>>>>> >>>>>>> TLBI commands then can be trapped. >>>>>>> >>>>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com> >>>>>>> >>>>>>> --- >>>>>>> v1 -> v2: >>>>>>> - rebase on v4.13-rc2 >>>>>>> --- >>>>>>> Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++ >>>>>>> drivers/iommu/arm-smmu-v3.c | 5 +++++ >>>>>>> 2 files changed, 9 insertions(+) >>>>>>> >>>>>>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt >>>>>>> b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt >>>>>>> index c9abbf3..ebb85e9 100644 >>>>>>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt >>>>>>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt >>>>>>> @@ -52,6 +52,10 @@ the PCIe specification. >>>>>>> >>>>>>> devicetree/bindings/interrupt-controller/msi.txt >>>>>>> for a description of the msi-parent property. >>>>>>> >>>>>>> +- tlbi-on-map : invalidate caches whenever there is an update of >>>>>>> + any remapping structure (updates to not-present >>>>>>> or >>>>>>> + present entries). >>>>>>> + >>>>>> >>>>>> My position on this hasn't changed, so NAK for this patch. If you want to >>>>>> emulate something outside of the SMMUv3 architecture, please do so, but >>>>>> don't pretend that it's an SMMUv3. >>>>>> >>>>>> Will >>>>> >>>>> What if the emulated device does not list arm,smmu-v3, listing >>>>> qemu,ssmu-v3 as compatible? Would that address the concern? >>>> >>>> Will, can you comment on this please? Are you open to reusing the code >>>> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does >>>> not claim to be compatible with smmuv3 but does try to behave very close to >>>> it except it can cache non-present structures? Or would you rather >>>> the code to support this is forked to qemu-smmu-v3.c? >>> >>> I still don't understand why this is preferable to a PV IOMMU >>> implementation. Not only is this proposing to issue TLB maintenance on >>> map, but the maintenance command itself is entirely made up. Why not just >>> have a map command? Anyway, I'm reluctant to add this hack to the driver >>> until: >>> >>> 1. There is a compelling reason to pursue this approach instead of a >>> PV approach (including performance measurements). >>> >>> 2. There is a specification for the QEMU fork of the ARM SMMUv3 >>> architecture, including the semantics of the new command being proposed >>> and what exactly the TLB maintenance requirements are on map (for >>> example, what if I change an STE or a CD -- are they cached too?). >> I am not sure I catch this last point. At the moment whenever the smmuv3 >> driver issues data structure invalidation commands (CMD_CFGI_*), those >> are trapped and I replay the mappings on host side. I have not changed >> anything on that side. > > But STEs and CDs have very similar rules to TLBs: you don't need to issue > invalidation if the data structure is transitioning from invalid to valid.
While looking at chapter "4.8 virtualisation" of the smmuv3 spec, I understand that if we were to use the 2 stages we would need to trap on STE updates since they are owned by the hyp. Spec says "updates to a guest STE are accompanied by a CMD_CFGI_STE (or similar) issued from the guest. So I understand invalidation of CDs are not mandated by the spec but invalidation of STEs if the data structure is transitioning from invalid to valid would be requested. Is that correct? I fail to understand if this is currently done by the smmuv3 driver though. > If you're caching those in QEMU, how do you keep them up-to-date? I can > also guarantee you that there will be additional data structures added > in future versions of the architecture, so you'll need to consider how > you want to operate when running on newer hardware. > >> I introduced a new map implementation defined command because the per >> page CMD_TLBI_NH_VA IOVA invalidation command was not efficient/usable >> with use cases such as DPDK on guest. I understood the spec provisions >> for such implementation defined commands. Also if we were to use dual stage, command queue accesses still would be trapped. So if a guest invalidates a hugepage it would send a storm of granule sized invalidations and each would be trapped. So maybe it does not happen often but I guess it would be pretty inefficient. On intel I understand the IOTLB Invalidation Descriptor has an AM field (address-mask) which specifies the number of contiguous second level pages that needs to be invalidated. When invalidating a large-page transaction, the driver can use the appropriate mask value (0 for 4KB, 9 for 2MB, 18 for 1GB). Thanks Eric > > Whilst there is a space for IMP DEF commands, this doesn't generally mean > that they can be repurposed by software. What if the underlying hardware > has an IMP DEF command that you want to export? Besides, my main points > here are that your command isn't well-specified and if you have to add > a command, why not just add a "map" command (i.e. implement a PV interface > instead)? > >>> 3. The ACPI IORT spec is updated to recognise this implementation >>> >>> 4. There is an implementation that can use the guest page tables directly, >>> because that may well make all of this moot. >> Most probably I will come back to you with questions on stage 1 + stage2 >> enablement and "4.8 Virtualisation" chapter of smmuv3 spec. Besides I >> also need to get access to some HW with smmuv3 ;-) > > Ok. > > Will > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu