Hi Shameer, On 06/12/17 15:38, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -----Original Message----- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: Wednesday, December 06, 2017 2:01 PM >> To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; >> Alex Williamson <alex.william...@redhat.com> >> Cc: peter.mayd...@linaro.org; qemu-devel@nongnu.org; Linuxarm >> <linux...@huawei.com>; qemu-...@nongnu.org; Zhaoshenglong >> <zhaoshengl...@huawei.com>; Zhuyijun <zhuyi...@huawei.com> >> Subject: Re: [Qemu-devel] [RFC 1/5] hw/vfio: Add function for getting >> reserved_region of device iommu group >> >> Hi Shameer, >> >> On 06/12/17 11:30, Shameerali Kolothum Thodi wrote: >>> Hi Alex, >>> >>>> -----Original Message----- >>>> From: Shameerali Kolothum Thodi >>>> Sent: Monday, November 20, 2017 4:31 PM >>>> To: 'Alex Williamson' <alex.william...@redhat.com> >>>> Cc: eric.au...@redhat.com; Zhuyijun <zhuyi...@huawei.com>; qemu- >>>> a...@nongnu.org; qemu-devel@nongnu.org; peter.mayd...@linaro.org; >>>> Zhaoshenglong <zhaoshengl...@huawei.com>; Linuxarm >>>> <linux...@huawei.com> >>>> Subject: RE: [Qemu-devel] [RFC 1/5] hw/vfio: Add function for getting >>>> reserved_region of device iommu group >>> [...] >>>>>>> And sysfs is a good interface if the user wants to use it to >>>>>>> configure the VM in a way that's compatible with a device. For >>>>>>> instance, in your case, a user could evaluate these reserved >>>>>>> regions across all devices in a system, or even across an entire >>>>>>> cluster, and instantiate the VM with a memory map compatible with >>>>>>> hotplugging any of those evaluated devices (QEMU implementation of >>>> allowing the user to do this TBD). >>>>>>> Having the vfio device evaluate these reserved regions only helps >>>>>>> in the cold-plug case. So the proposed solution is limited in >>>>>>> scope and doesn't address similar needs on other platforms. There >>>>>>> is value to verifying that a device's IOVA space is compatible >>>>>>> with a VM memory map and modifying the memory map on cold-plug or >>>>>>> rejecting the device on hot-plug, but isn't that why we have an >>>>>>> ioctl within vfio to expose information about the IOMMU? Why take >>>>>>> the path of allowing QEMU to rummage through sysfs files outside >>>>>>> of vfio, implying additional security and access concerns, rather >>>>>>> than filling the gap within the vifo API? >>>>>> >>>>>> Thanks Alex for the explanation. >>>>>> >>>>>> I came across this patch[1] from Eric where he introduced the IOCTL >>>>> interface to >>>>>> retrieve the reserved regions. It looks like this can be reworked to >>>>> accommodate >>>>>> the above requirement. >>>>> >>>>> I don't think we need a new ioctl for this, nor do I think that >>>>> describing the holes is the correct approach. The existing >>>>> VFIO_IOMMU_GET_INFO ioctl can be extended to support capability >>>>> chains, as we've done for VFIO_DEVICE_GET_REGION_INFO. >>>> >>>> Right, as far as I can see the above mentioned patch is doing exactly the >> same, >>>> extending the VFIO_IOMMU_GET_INFO ioctl with capability chain. >>>> >>>>> IMO, we should try to >>>>> describe the available fixed IOVA regions which are available for >>>>> mapping rather than describing various holes within the address space >>>>> which are unavailable. The latter method always fails to describe the >>>>> end of the mappable IOVA space and gets bogged down in trying to >>>>> classify the types of holes that might exist. Thanks, >>>> >>> >>> I was going through this and noticed that it is possible to have multiple >>> iommu domains associated with a container. If that's true, is it safe to >>> make the assumption that all the domains will have the same iova >>> geometry or not? >> To me the answer is no. >> >> There are several iommu domains "underneath" a single container. You >> attach vfio groups to a container. Each of them is associated to an >> iommu group and an iommu domain. See vfio_iommu_type1_attach_group(). >> >> Besides, the reserved regions are per iommu group. >> > > Thanks for your reply. Yes, container can have multiple groups(hence multiple > iommu domains) and reserved regions are per group. Hence while deciding > the default supported iova geometry we have to go through all the domains > in the container and select smallest aperture as the supported default iova > range. > > Please find below snippet from a patch I am working on. Please > let me know your thoughts on this. > > Thanks, > Shameer > > -- >8 -- > +static int vfio_build_iommu_iova_caps(struct vfio_iommu *iommu, > + struct vfio_info_cap *caps) { > + struct iommu_resv_region *resv, *resv_next; > + struct vfio_iommu_iova *iova, *iova_next; > + struct list_head group_resv_regions, vfio_iova_regions; > + struct vfio_domain *domain; > + struct vfio_group *g; > + phys_addr_t start, end; > + int ret = 0; > + > + domain = list_first_entry(&iommu->domain_list, > + struct vfio_domain, next); > + /* Get the default iova range supported */ > + start = domain->domain->geometry.aperture_start; > + end = domain->domain->geometry.aperture_end; > > This is where I am confused. I think instead I should go over > the domain_list and select the smallest aperture as default > iova range. yes that's correct. I Just want to warn you Pierre is working on the same topic. May be worth to sync together.
[PATCH] vfio/iommu_type1: report the IOMMU aperture info (https://patchwork.kernel.org/patch/10084655/) I think he plans to rework his series with capability chain too. Thanks Eric > > + INIT_LIST_HEAD(&vfio_iova_regions); > + vfio_insert_iova(start, end, &vfio_iova_regions); > + > + /* Get reserved regions if any */ > + INIT_LIST_HEAD(&group_resv_regions); > + list_for_each_entry(g, &domain->group_list, next) > + iommu_get_group_resv_regions(g->iommu_group, > + &group_resv_regions); > + list_sort(NULL, &group_resv_regions, vfio_resv_cmp); > + > + /* Update iova range excluding reserved regions */ > ... > -- >8 -- > >