On Tue, 18 Oct 2016 16:52:04 +1100 David Gibson <da...@gibson.dropbear.id.au> wrote:
> On Mon, Oct 17, 2016 at 10:47:02PM -0600, Alex Williamson wrote: > > On Tue, 18 Oct 2016 15:06:55 +1100 > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote: > > > > On Mon, 17 Oct 2016 18:44:21 +0300 > > > > "Aviv B.D" <bd.a...@gmail.com> wrote: > > > > > > > > > From: "Aviv Ben-David" <bd.a...@gmail.com> > > > > > > > > > > * Advertize Cache Mode capability in iommu cap register. > > > > > This capability is controlled by "cache-mode" property of > > > > > intel-iommu device. > > > > > To enable this option call QEMU with "-device > > > > > intel-iommu,cache-mode=true". > > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the domain > > > > > belong to > > > > > registered notifier, and notify accordingly. > > > > > > > > > > Currently this patch still doesn't enabling VFIO devices support with > > > > > vIOMMU > > > > > present. Current problems: > > > > > * vfio_iommu_map_notify is not aware about memory range belong to > > > > > specific > > > > > VFIOGuestIOMMU. > > > > > > > > Could you elaborate on why this is an issue? > > > > > > > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate > > > > > over > > > > > 64bit address space. Commenting out the call to this function > > > > > enables > > > > > workable VFIO device while vIOMMU present. > > > > > > > > This has been discussed previously, it would be incorrect for vfio not > > > > to call the replay function. The solution is to add an iommu driver > > > > callback to efficiently walk the mappings within a MemoryRegion. > > > > > > Right, replay is a bit of a hack. There are a couple of other > > > approaches that might be adequate without a new callback: > > > - Make the VFIOGuestIOMMU aware of the guest address range mapped > > > by the vIOMMU. Intel currently advertises that as a full 64-bit > > > address space, but I bet that's not actually true in practice. > > > - Have the IOMMU MR advertise a (minimum) page size for vIOMMU > > > mappings. That may let you stpe through the range with greater > > > strides > > > > Hmm, VT-d supports at least a 39-bit address width and always supports > > a minimum 4k page size, so yes that does reduce us from 2^52 steps down > > to 2^27, > > Right, which is probably doable, if not ideal > > > but it's still absurd to walk through the raw address space. > > Well.. it depends on the internal structure of the IOMMU. For Power, > it's traditionally just a 1-level page table, so we can't actually do > any better than stepping through each IOMMU page. Intel always has a least a 3-level page table AIUI. > > It does however seem correct to create the MemoryRegion with a width > > that actually matches the IOMMU capability, but I don't think that's a > > sufficient fix by itself. Thanks, > > I suspect it would actually make it workable in the short term. > > But I don't disagree that a "traverse" or "replay" callback of some > sort in the iommu_ops is a better idea long term. Having a fallback > to the current replay implementation if the callback isn't supplied > seems pretty reasonable though. Exactly, the callback could be optional where IOMMUs that supply a relatively small IOVA window could fallback to the code we have today. Thanks, Alex