> From: Peter Xu > Sent: Wednesday, November 30, 2016 5:24 PM > > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote: > > * intel_iommu's replay op is not implemented yet (May come in different > > patch > > set). > > The replay function is required for hotplug vfio device and to move > > devices > > between existing domains. > > I am thinking about this replay thing recently and now I start to > doubt whether the whole vt-d vIOMMU framework suites this... > > Generally speaking, current work is throwing away the IOMMU "domain" > layer here. We maintain the mapping only per device, and we don't care > too much about which domain it belongs. This seems problematic. > > A simplest wrong case for this is (let's assume cache-mode is > enabled): if we have two assigned devices A and B, both belong to the > same domain 1. Meanwhile, in domain 1 assume we have one mapping which > is the first page (iova range 0-0xfff). Then, if guest wants to > invalidate the page, it'll notify VT-d vIOMMU with an invalidation > message. If we do this invalidation per-device, we'll need to UNMAP > the region twice - once for A, once for B (if we have more devices, we > will unmap more times), and we can never know we have done duplicated > work since we don't keep domain info, so we don't know they are using > the same address space. The first unmap will work, and then we'll > possibly get some errors on the rest of dma unmap failures.
Tianyu and I discussed there is a bigger problem: today VFIO assumes only one address space per container, which is fine w/o vIOMMU (all devices in same container share same GPA->HPA translation). However it's not the case when vIOMMU is enabled, because guest Linux implements per-device IOVA space. If a VFIO container includes multiple devices, it means multiple address spaces required per container... > > Looks like we just cannot live without knowing this domain layer. > Because the address space is binded to the domain. If we want to sync > the address space (here to setup a correct shadow page table), we need > to do it per-domain. > > What I can think of as a solution is that we introduce this "domain" > layer - like a memory region per domain. When invalidation happens, > it's per-domain, not per-device any more (actually I guess that's what > current vt-d iommu driver in kernel is doing, we just ignored it - we > fetch the devices that matches the domain ID). We can/need to maintain > something different, like sid <-> domain mappings (we can do this as > long as we are notified when context entries changed), per-domain > mappings (just like per-device mappings that we are trying to build in > this series, but what we really need is IMHO per domain one), etc. > When device switches domain, we switch the IOMMU memory region > accordingly. > > Does this make any sense? Comments are greatly welcomed (especially > from AlexW and DavidG). > > Thanks, > > -- peterx