On Fri, 2 Dec 2016 14:08:59 +0800 Peter Xu <pet...@redhat.com> wrote:
> On Thu, Dec 01, 2016 at 04:27:52PM +0800, Lan Tianyu wrote: > > On 2016年11月30日 17:23, Peter Xu wrote: > > > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote: > > >> * intel_iommu's replay op is not implemented yet (May come in different > > >> patch > > >> set). > > >> The replay function is required for hotplug vfio device and to move > > >> devices > > >> between existing domains. > > > > > > I am thinking about this replay thing recently and now I start to > > > doubt whether the whole vt-d vIOMMU framework suites this... > > > > > > Generally speaking, current work is throwing away the IOMMU "domain" > > > layer here. We maintain the mapping only per device, and we don't care > > > too much about which domain it belongs. This seems problematic. > > > > > > A simplest wrong case for this is (let's assume cache-mode is > > > enabled): if we have two assigned devices A and B, both belong to the > > > same domain 1. Meanwhile, in domain 1 assume we have one mapping which > > > is the first page (iova range 0-0xfff). Then, if guest wants to > > > invalidate the page, it'll notify VT-d vIOMMU with an invalidation > > > message. If we do this invalidation per-device, we'll need to UNMAP > > > the region twice - once for A, once for B (if we have more devices, we > > > will unmap more times), and we can never know we have done duplicated > > > work since we don't keep domain info, so we don't know they are using > > > the same address space. The first unmap will work, and then we'll > > > possibly get some errors on the rest of dma unmap failures. > > > > > > Hi Peter: > > Hi, Tianyu, > > > According VTD spec 6.2.2.1, "Software must ensure that, if multiple > > context-entries (or extended-context-entries) are programmed > > with the same Domain-id (DID), such entries must be programmed with same > > value for the secondlevel page-table pointer (SLPTPTR) field, and same > > value for the PASID Table Pointer (PASIDTPTR) field.". > > > > So if two assigned device may have different IO page table, they should > > be put into different domains. > > > By default, devices will be put into different domains. However it > should be legal that we put two assigned devices into the same IOMMU > domain (in the guest), right? And we should handle both cases well > IMHO. > > Actually I just wrote a tool to do it based on vfio-pci: > > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c > > If we run this tool in the guest with parameter like: > > ./vfio-bind-groups 00:03.0 00:04.0 > > It'll create one single domain, and put PCI device 00:03.0, 00:04.0 > into the same IOMMU domain. On the host though, I'd expect we still have separate IOMMU domains, one for each device and we do DMA_{UN}MAP ioctls separately per container. Thanks, Alex