On Fri, 2014-03-28 at 16:12 +1100, Alexey Kardashevskiy wrote: > On 03/20/2014 04:25 PM, David Gibson wrote: > > On Wed, Mar 19, 2014 at 01:57:41PM -0600, Alex Williamson wrote: > >> On Wed, 2014-03-12 at 16:52 +1100, Alexey Kardashevskiy wrote: > >>> From: David Gibson <da...@gibson.dropbear.id.au> > > [snip] > >>> + if (!memory_region_is_ram(mr)) { > >>> + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n", > >>> + xlat); > >>> + return; > >>> + } > >>> + if (len & iotlb->addr_mask) { > >>> + DPRINTF("iommu has granularity incompatible with target AS\n"); > >> > >> Is this possible? Assuming len is initially a power-of-2, would the > >> translate function change it? Maybe worth a comment to explain. > > > > translate can absolutely change the length. It will generally > > truncate it to the IOMMU page size, in fact. > > > > [snip] > >>> + DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n", > >>> + iova, int128_get64(int128_sub(llend, int128_one()))); > >>> + /* > >>> + * FIXME: We should do some checking to see if the > >>> + * capabilities of the host VFIO IOMMU are adequate to model > >>> + * the guest IOMMU > >>> + * > >>> + * FIXME: This assumes that the guest IOMMU is empty of > >>> + * mappings at this point - we should either enforce this, or > >>> + * loop through existing mappings to map them into VFIO. > >>> + * > >>> + * FIXME: For VFIO iommu types which have KVM acceleration to > >>> + * avoid bouncing all map/unmaps through qemu this way, this > >>> + * would be the right place to wire that up (tell the KVM > >>> + * device emulation the VFIO iommu handles to use). > >>> + */ > >> > >> That's a lot of FIXMEs... The second one in particular looks like it > >> needs to expand a bit on why this is likely a valid assumption. The > >> last one is more of a TODO than a FIXME. > > > > I think #2 isn't a valid assumption in general. It was true for the > > situation I was testing at the time, due to the order of pseries > > initialization, so I left it to get a proof of concept reasonably > > quickly. > > > > But I think that one's a FIXME that actually needs to be fixed. > > > But how? > > At the moment, for SPAPR, the table gets cleaned when group is attached to > a container (VFIO_GROUP_SET_CONTAINER ioctl) which happens right before > registering the memory listener so we are fine (at least for SPAPR). > Is that true for x86 or we need something more? > > "loop through existing mappings to map them into VFIO" - this I do not > really understand. We do not track mapping in QEMU so we cannot really loop > through them.
Making registration of a guest IOMMU more like a MemoryListener would solve this, the infrastructure should replay existing mappings on startups and clear them on shutdown, then we could also get rid of the FIXME on the region_del path. > > [snip] > >>> + /* > >>> + * FIXME: We assume the one big unmap below is adequate to > >>> + * remove any individual page mappings in the IOMMU which > >>> + * might have been copied into VFIO. That may not be true for > >>> + * all IOMMU types > >>> + */ > >> > >> We assume this because the IOVA that gets unmapped is the same > >> regardless of whether a guest IOMMU is present? > > > > Uh.. no. This assumption works for a page table based IOMMU where a > > big unmap just flattens a large range of IO-PTEs. > > > Sorry for my poor english but how exactly is that different from what Alex > said? IOVA is a PCI bus address between dma_window_start and > dma_window_start+dma_window_size. I think David is trying to describe that the IOMMU must be able to unmap a sparse sub-region of a larger unmap, while I'm trying to make sure there's no IOVA translation that might interfere with using the "standard" unmap path rather than the guest IOMMU unmap path. Thanks, Alex > > It might not work > > for some kind of extent or TLB based IOMMU, where operations are > > expected to exactly match the addresses of map operations. > > > > I don't know if IOMMUs that have trouble with this are a realistic > > prospect, but they're at least a theoretical possibility, hence the > > comment. > > > >> > >>> + } > >>> + > >>> iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > >>> end = (section->offset_within_address_space + > >>> int128_get64(section->size)) & > >>> TARGET_PAGE_MASK; > >> > >> > >> > > > >