On 2016-03-01 15:30, Michael S. Tsirkin wrote: > On Tue, Mar 01, 2016 at 03:18:05PM +0100, Jan Kiszka wrote: >> On 2016-03-01 15:12, Jan Kiszka wrote: >>> On 2016-03-01 14:55, Michael S. Tsirkin wrote: >>>> On Tue, Mar 01, 2016 at 02:48:19PM +0100, Jan Kiszka wrote: >>>>> On 2016-03-01 14:07, Michael S. Tsirkin wrote: >>>>>> On Sun, Feb 21, 2016 at 09:10:56PM +0300, David Kiarie wrote: >>>>>>> Hello there, >>>>>>> >>>>>>> Repost, AMD IOMMU patches version 6. >>>>>>> >>>>>>> Changes since version 5 >>>>>>> -Fixed macro formating issues >>>>>>> -changed occurences of IO MMU to IOMMU for consistency >>>>>>> -Fixed capability registers duplication >>>>>>> -Rebased to current master >>>>>>> >>>>>>> David Kiarie (4): >>>>>>> hw/i386: Introduce AMD IOMMU >>>>>>> hw/core: Add AMD IOMMU to machine properties >>>>>>> hw/i386: ACPI table for AMD IOMMU >>>>>>> hw/pci-host: Emulate AMD IOMMU >>>>>> >>>>>> I went over AMD IOMMU spec. >>>>>> I'm concerned that it appears that there's no chance for it to >>>>>> work correctly if host caches invalid PTE entries. >>>>>> >>>>>> The spec vaguely discusses write-protecting such PTEs but >>>>>> that would be very complex if it can be made to work at all. >>>>>> >>>>>> This means that this can't work with e.g. VFIO. >>>>>> It can only work with emulated devices. >>>>> >>>>> You mean it can't work if we program a real IOMMU (for VFIO) with >>>>> translated data from the emulated one but cannot track any updates of >>>>> the related page tables because the guest is not required to issue >>>>> traceable flush requests? Hmm, too bad. >>>>> >>>>>> >>>>>> OTOH VTD can easily support PTE shadowing by setting a flag. >>>>> >>>>> Do you mean RWBF=1 in the CAP register? Given that "Newer hardware >>>>> implementations are expected to NOT require explicit software flushing >>>>> of write buffers and report RWBF=0 in the Capability register", we may >>>>> eventually run into guests that no longer check that flag if we expose >>>>> something that looks like a "newer" implementation. >>>> >>>> Hopefully not, if that happens we'll have to do a PV IOMMU :) >>> >>> Please not. >>> >>>> >>>>> However, this flag is not set right now in our VT-d model. >>>>>> >>>>>> I'd like us to find some way to avoid possibility >>>>>> of user error creating a configuration mixing e.g. >>>>>> vfio with the amd iommu. >>>>>> >>>>>> I'm not sure how to do this. >>>>>> >>>>>> Any idea? >>>>> >>>>> There is likely no way around write-protecting the IOMMU page tables (in >>>>> KVM mode) once we evaluated and cached them somewhere. >>>> >>>> Well for one, it's possible to use vt-d and not amd iommu. >>> >>> That would lead to nice combos of AMD CPUs with VT-d IOMMU. While it may >>> be possible, I wouldn't rely on guests having tested that combination >>> very well. >> >> To make the concern more concrete: I'm playing with code that will reuse >> the MMU page tables for the IOMMU - the AMD architecture is designed for >> that optimization (in contrast to Intel's). So, if the guest is not >> foreseeing that artificial combo above (ours will definitely not) >> because it is designed around the reuse, it will at least fail to run. >> >> Jan > > So if you have an AMD iommu on the host and that is capable > of 2-level translation, then the flushing problem > can be fixed by a kind of iommu pass-through > where you point the host's iommu to guest's page tables.
Yes, right, that could be another approach - provided the tables have compatible entries. I didn't look details of any of both so far, but I wouldn't be overly optimistic. Usually, hardware is not very well designed for interesting nesting purposes. > > So maybe what you need to do is make it possible > for device to query iommu and ask whether it > supports devices caching invalid PTEs. > If not, vfio could fail. Makes sense. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux