On Wed, Apr 20, 2016 at 07:43:41PM +1000, Alexey Kardashevskiy wrote: > On 04/07/2016 11:10 AM, David Gibson wrote: > >Subject doesn't seem quite right, since you added at least minimal > >support for the SPAPRv2 IOMMU in the prereg patch. > > > >On Mon, Apr 04, 2016 at 07:33:45PM +1000, Alexey Kardashevskiy wrote: > >>New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. > >>This adds ability to VFIO common code to dynamically allocate/remove > >>DMA windows in the host kernel when new VFIO container is added/removed. > >> > >>This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_add > >>and adds just created IOMMU into the host IOMMU list; the opposite > >>action is taken in vfio_listener_region_del. > >> > >>When creating a new window, this uses euristic to decide on the TCE table > >>levels number. > > > >"heuristic" has an 'h' (yes, English spelling is stupid[0]). > > > >[0] The historical reasons for that are kind of fascinating, though. > > Tried googling, could not spot quickly the reasoning, any hints what to > google for? Or just a link with an explanation? :)
That wasn't a comment about heuristic specifically, but english spelling in general. I got most of the fascinating stuff I've seen from: http://www.amazon.com/Spell-Out-Enthralling-Extraordinary-Spelling/dp/1250056128/ref=la_B000AP940C_1_5/189-2461131-2789630?s=books&ie=UTF8&qid=1461211129&sr=1-5 > >>This should cause no guest visible change in behavior. > >> > >>Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > >>--- > >>Changes: > >>v14: > >>* new to the series > >> > >>--- > >>TODO: > >>* export levels to PHB > >>--- > >> hw/vfio/common.c | 115 > >> ++++++++++++++++++++++++++++++++++++++++++++++++++----- > >> trace-events | 2 + > >> 2 files changed, 107 insertions(+), 10 deletions(-) > >> > >>diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>index 5e5b77c..57a51df 100644 > >>--- a/hw/vfio/common.c > >>+++ b/hw/vfio/common.c > >>@@ -279,6 +279,14 @@ static int vfio_host_win_add(VFIOContainer *container, > >> return 0; > >> } > >> > >>+static void vfio_host_win_del(VFIOContainer *container, hwaddr min_iova) > >>+{ > >>+ VFIOHostDMAWindow *hostwin = vfio_host_win_lookup(container, min_iova, > >>1); > >>+ > >>+ g_assert(hostwin); > >>+ QLIST_REMOVE(hostwin, hiommu_next); > >>+} > >>+ > >> static bool vfio_listener_skipped_section(MemoryRegionSection *section) > >> { > >> return (!memory_region_is_ram(section->mr) && > >>@@ -392,6 +400,63 @@ static void vfio_listener_region_add(MemoryListener > >>*listener, > >> } > >> end = int128_get64(llend); > >> > >>+ if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { > > > >I think the "add region" path could do with being split out into a > >different function - vfio_listener_region_add() is getting pretty > >huge. > > It is big but not huge and I am trying to avoid having functions with > "spapr" in their names in common.c as once they appear, we will start having > a discussion if they should move to a separate file and if they do, then may > be some other code should too, etc... I don't think it needs to be a "spapr" named function. The spapr backend is the only one to support it for now, but other iommus could support dynamic windows in future (for example, if I ever get time to implement the "type2" converged interface I was thinking about, I'd look to include that). > >>+ unsigned entries, pages, pagesize = qemu_real_host_page_size; > >>+ struct vfio_iommu_spapr_tce_create create = { .argsz = > >>sizeof(create) }; > >>+ > >>+ trace_vfio_listener_region_add_iommu(iova, end - 1); > >>+ if (section->mr->iommu_ops) { > >>+ pagesize = section->mr->iommu_ops->get_page_sizes(section->mr); > > > >Since you're querying the guest IOMMU here, I assume pagesize is > >supposed to represent *guest* IOMMU pagesizes, in which case it should > >default to TARGET_PAGE_SIZE, instead of qemu_real_host_page_size. > >(didn't you already have a function which implemented that fallback?) > > Yes, will use the helper. > > > >>+ } > >>+ /* > >>+ * FIXME: For VFIO iommu types which have KVM acceleration to > >>+ * avoid bouncing all map/unmaps through qemu this way, this > >>+ * would be the right place to wire that up (tell the KVM > >>+ * device emulation the VFIO iommu handles to use). > >>+ */ > >>+ create.window_size = int128_get64(section->size); > >>+ create.page_shift = ctz64(pagesize); > >>+ /* > >>+ * SPAPR host supports multilevel TCE tables, there is some > >>+ * euristic to decide how many levels we want for our table: > > > >s/some euristic/a heuristic/ > > > >>+ * 0..64 = 1; 65..4096 = 2; 4097..262144 = 3; 262145.. = 4 > >>+ */ > >>+ entries = create.window_size >> create.page_shift; > >>+ pages = (entries * sizeof(uint64_t)) / getpagesize(); > >>+ create.levels = ctz64(pow2ceil(pages) - 1) / 6 + 1; > >>+ > >>+ if (vfio_host_win_lookup(container, create.start_addr, > >>+ create.start_addr + create.window_size - > >>1)) { > >>+ goto fail; > > > >Hmm.. if you successfully look up a host window, it seems to me you > >shouldn't fail, but in fact don't even need to create a new window > >(the removal path gets harder though, because you need to check if any > >guest window requires that host window). > > > At the moment if the window is there, it is failure in the environment I am > testing it in. And, having a host kernel which cannot allocate and map > windows randomly, it is unlikely that I'll have a setup where this spot > won't mean that something went wrong. x86 case needs lot more than this > anyway. Hm, I suppose so. > > > >Requiring that the host windows exactly match the guest windows is > >probably ok for a first version - except that in that case any overlap > >should cause a failure, not just a complete inclusion. > > Yes, for now this should fail too, I'll fix it. > > > > > >>+ } > >>+ > >>+ ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create); > >>+ if (ret) { > >>+ error_report("Failed to create a window, ret = %d (%m)", ret); > >>+ goto fail; > >>+ } > >>+ > >>+ if (create.start_addr != section->offset_within_address_space) { > >>+ struct vfio_iommu_spapr_tce_remove remove = { > >>+ .argsz = sizeof(remove), > >>+ .start_addr = create.start_addr > >>+ }; > >>+ error_report("Host doesn't support DMA window at > >>%"HWADDR_PRIx", must be %"PRIx64, > >>+ section->offset_within_address_space, > >>+ create.start_addr); > >>+ ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove); > >>+ ret = -EINVAL; > >>+ goto fail; > >>+ } > >>+ trace_vfio_spapr_create_window(create.page_shift, > >>+ create.window_size, > >>+ create.start_addr); > >>+ > >>+ vfio_host_win_add(container, create.start_addr, > >>+ create.start_addr + create.window_size - 1, > >>+ 1ULL << create.page_shift); > >>+ } > >>+ > >> if (!vfio_host_win_lookup(container, iova, end - 1)) { > >> error_report("vfio: IOMMU container %p can't map guest IOVA > >> region" > >> " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, > >>@@ -525,6 +590,22 @@ static void vfio_listener_region_del(MemoryListener > >>*listener, > >> container, iova, end - iova, ret); > >> } > >> > >>+ if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { > >>+ struct vfio_iommu_spapr_tce_remove remove = { > >>+ .argsz = sizeof(remove), > >>+ .start_addr = section->offset_within_address_space, > >>+ }; > >>+ ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove); > >>+ if (ret) { > >>+ error_report("Failed to remove window at %"PRIx64, > >>+ remove.start_addr); > >>+ } > >>+ > >>+ vfio_host_win_del(container, section->offset_within_address_space); > >>+ > >>+ trace_vfio_spapr_remove_window(remove.start_addr); > >>+ } > >>+ > >> if (iommu && iommu->iommu_ops && iommu->iommu_ops->vfio_stop) { > >> iommu->iommu_ops->vfio_stop(section->mr); > >> } > >>@@ -915,11 +996,6 @@ static int vfio_connect_container(VFIOGroup *group, > >>AddressSpace *as) > >> } > >> } > >> > >>- /* > >>- * This only considers the host IOMMU's 32-bit window. At > >>- * some point we need to add support for the optional 64-bit > >>- * window and dynamic windows > >>- */ > >> info.argsz = sizeof(info); > >> ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > >> if (ret) { > >>@@ -928,11 +1004,30 @@ static int vfio_connect_container(VFIOGroup *group, > >>AddressSpace *as) > >> goto listener_release_exit; > >> } > >> > >>- /* The default table uses 4K pages */ > >>- vfio_host_win_add(container, info.dma32_window_start, > >>- info.dma32_window_start + > >>- info.dma32_window_size - 1, > >>- 0x1000); > >>+ if (v2) { > >>+ /* > >>+ * There is a default window in just created container. > >>+ * To make region_add/del simpler, we better remove this > >>+ * window now and let those iommu_listener callbacks > >>+ * create/remove them when needed. > >>+ */ > >>+ struct vfio_iommu_spapr_tce_remove remove = { > >>+ .argsz = sizeof(remove), > >>+ .start_addr = info.dma32_window_start, > >>+ }; > >>+ ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove); > >>+ if (ret) { > >>+ error_report("vfio: VFIO_IOMMU_SPAPR_TCE_REMOVE failed: > >>%m"); > >>+ ret = -errno; > >>+ goto free_container_exit; > >>+ } > >>+ } else { > >>+ /* The default table uses 4K pages */ > >>+ vfio_host_win_add(container, info.dma32_window_start, > >>+ info.dma32_window_start + > >>+ info.dma32_window_size - 1, > >>+ 0x1000); > >>+ } > >> } else { > >> error_report("vfio: No available IOMMU models"); > >> ret = -EINVAL; > >>diff --git a/trace-events b/trace-events > >>index 23ca0b9..5c651fa 100644 > >>--- a/trace-events > >>+++ b/trace-events > >>@@ -1738,6 +1738,8 @@ vfio_region_finalize(const char *name, int index) > >>"Device %s, region %d" > >> vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s > >> mmaps enabled: %d" > >> vfio_ram_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" > >> size=%"PRIx64" ret=%d" > >> vfio_ram_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" > >> size=%"PRIx64" ret=%d" > >>+vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) > >>"pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64 > >>+vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64 > >> > >> # hw/vfio/platform.c > >> vfio_platform_base_device_init(char *name, int groupid) "%s belongs to > >> group #%d" > > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature