> -----Original Message----- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Monday, November 20, 2017 3:58 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com> > Cc: eric.au...@redhat.com; Zhuyijun <zhuyi...@huawei.com>; qemu- > a...@nongnu.org; qemu-devel@nongnu.org; peter.mayd...@linaro.org; > Zhaoshenglong <zhaoshengl...@huawei.com>; Linuxarm > <linux...@huawei.com> > Subject: Re: [Qemu-devel] [RFC 1/5] hw/vfio: Add function for getting > reserved_region of device iommu group > > On Mon, 20 Nov 2017 11:58:43 +0000 > Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com> wrote: > > > > -----Original Message----- > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > Sent: Wednesday, November 15, 2017 6:25 PM > > > To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com> > > > Cc: Zhuyijun <zhuyi...@huawei.com>; qemu-...@nongnu.org; qemu- > > > de...@nongnu.org; eric.au...@redhat.com; peter.mayd...@linaro.org; > > > Zhaoshenglong <zhaoshengl...@huawei.com> > > > Subject: Re: [Qemu-devel] [RFC 1/5] hw/vfio: Add function for getting > > > reserved_region of device iommu group > > > > > > On Wed, 15 Nov 2017 09:49:41 +0000 > > > Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com> > wrote: > > > > > > > Hi Alex, > > > > > > > > > -----Original Message----- > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > Sent: Tuesday, November 14, 2017 3:48 PM > > > > > To: Zhuyijun <zhuyi...@huawei.com> > > > > > Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; > > > > > eric.au...@redhat.com; peter.mayd...@linaro.org; Shameerali > Kolothum > > > > > Thodi <shameerali.kolothum.th...@huawei.com>; Zhaoshenglong > > > > > <zhaoshengl...@huawei.com> > > > > > Subject: Re: [Qemu-devel] [RFC 1/5] hw/vfio: Add function for getting > > > > > reserved_region of device iommu group > > > > > > > > > > On Tue, 14 Nov 2017 09:15:50 +0800 > > > > > <zhuyi...@huawei.com> wrote: > > > > > > > > > > > From: Zhu Yijun <zhuyi...@huawei.com> > > > > > > > > > > > > With kernel 4.11, iommu/smmu will populate the MSI IOVA reserved > > > > > > window and PCI reserved window which has to be excluded from > Guest > > > iova > > > > > allocations. > > > > > > > > > > > > However, If it falls within the Qemu default virtual memory address > > > > > > space, then reserved regions may get allocated for a Guest VF DMA > iova > > > > > > and it will fail. > > > > > > > > > > > > So get those reserved regions in this patch and create some holes in > > > > > > the Qemu ram address in next patchset. > > > > > > > > > > > > Signed-off-by: Zhu Yijun <zhuyi...@huawei.com> > > > > > > --- > > > > > > hw/vfio/common.c | 67 > > > > > +++++++++++++++++++++++++++++++++++++++++++ > > > > > > hw/vfio/pci.c | 2 ++ > > > > > > hw/vfio/platform.c | 2 ++ > > > > > > include/exec/memory.h | 7 +++++ > > > > > > include/hw/vfio/vfio-common.h | 3 ++ > > > > > > 5 files changed, 81 insertions(+) > > > > > > > > > > I generally prefer the vfio interface to be more self sufficient, if > > > > > there > are > > > > > regions the IOMMU cannot map, we should be describing those via > > > capabilities > > > > > on the container through the vfio interface. If we're just scraping > together > > > > > things from sysfs, the user can just as easily do that and provide an > explicit > > > > > memory map for the VM taking the devices into account. > > > > > > > > Ok. I was under the impression that the purpose of introducing the > > > > /sys/kernel/iommu_groups/reserved_regions was to get the IOVA regions > > > > that are reserved(MSI or non-mappable) for Qemu or other apps to > > > > make use of. I think this was introduced as part of the "KVM/MSI > passthrough > > > > support on ARM" patch series. And if I remember correctly, Eric had > > > > an approach where the user space can retrieve all the reserved regions > > > through > > > > the VFIO_IOMMU_GET_INFO ioctl and later this idea was replaced with > the > > > > sysfs interface. > > > > > > > > May be I am missing something here. > > > > > > And sysfs is a good interface if the user wants to use it to configure > > > the VM in a way that's compatible with a device. For instance, in your > > > case, a user could evaluate these reserved regions across all devices in > > > a system, or even across an entire cluster, and instantiate the VM with > > > a memory map compatible with hotplugging any of those evaluated > > > devices (QEMU implementation of allowing the user to do this TBD). > > > Having the vfio device evaluate these reserved regions only helps in > > > the cold-plug case. So the proposed solution is limited in scope and > > > doesn't address similar needs on other platforms. There is value to > > > verifying that a device's IOVA space is compatible with a VM memory map > > > and modifying the memory map on cold-plug or rejecting the device on > > > hot-plug, but isn't that why we have an ioctl within vfio to expose > > > information about the IOMMU? Why take the path of allowing QEMU to > > > rummage through sysfs files outside of vfio, implying additional > > > security and access concerns, rather than filling the gap within the > > > vifo API? > > > > Thanks Alex for the explanation. > > > > I came across this patch[1] from Eric where he introduced the IOCTL > interface to > > retrieve the reserved regions. It looks like this can be reworked to > accommodate > > the above requirement. > > I don't think we need a new ioctl for this, nor do I think that > describing the holes is the correct approach. The existing > VFIO_IOMMU_GET_INFO ioctl can be extended to support capability chains, > as we've done for VFIO_DEVICE_GET_REGION_INFO.
Right, as far as I can see the above mentioned patch is doing exactly the same, extending the VFIO_IOMMU_GET_INFO ioctl with capability chain. > IMO, we should try to > describe the available fixed IOVA regions which are available for > mapping rather than describing various holes within the address space > which are unavailable. The latter method always fails to describe the > end of the mappable IOVA space and gets bogged down in trying to > classify the types of holes that might exist. Thanks, Ok. I guess that is to take care iommu max address width case. Thanks, Shameer