Hi Zhenzhong, On 11/14/23 11:09, Zhenzhong Duan wrote: > Hi, > > Thanks all for giving guides and comments on previous series, this is > the remaining part of the iommufd support. > > Based on Cédric's suggestion, replace old config method for IOMMUFD > with Kconfig. > > Based on Jason's suggestion, drop the implementation of manually > allocating hwpt and switch to IOAS attach/detach. > > Beside current test, we also tested mdev with mtty for better cover range. > > PATCH 1: Introduce iommufd object > PATCH 2-9: add IOMMUFD container and cdev support > PATCH 10-17: fd passing for cdev and linking to IOMMUFD > PATCH 18: make VFIOContainerBase parameter const > PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 > > > We have done wide test with different combinations, e.g: > - PCI device were tested > - FD passing and hot reset with some trick. > - device hotplug test with legacy and iommufd backends > - with or without vIOMMU for legacy and iommufd backends > - divices linked to different iommufds > - VFIO migration with a E800 net card(no dirty sync support) passthrough > - platform, ccw and ap were only compile-tested due to environment limit > - test mdev pass through with mtty and mix with real device and different BE > > Given some iommufd kernel limitations, the iommufd backend is > not yet fully on par with the legacy backend w.r.t. features like: > - p2p mappings (you will see related error traces) > - dirty page sync > - and etc.
Feel free to add my T-b: Tested-by: Eric Auger <eric.au...@redhat.com> Thanks Eric > > > qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6 > Based on vfio-next, commit id: 1a22fb936e > > -------------------------------------------------------------------------- > > Below are some background and graph about the design: > > With the introduction of iommufd, the Linux kernel provides a generic > interface for userspace drivers to propagate their DMA mappings to kernel > for assigned devices. This series does the porting of the VFIO devices > onto the /dev/iommu uapi and let it coexist with the legacy implementation. > > At QEMU level, interactions with the /dev/iommu are abstracted by a new > iommufd object (compiled in with the CONFIG_IOMMUFD option). > > Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be > linked with an iommufd object. In this series, the vfio-pci device is > granted with such capability (other VFIO devices are not yet ready): > > It gets a new optional parameter named iommufd which allows to pass > an iommufd object: > > -object iommufd,id=iommufd0 > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > Note the /dev/iommu and vfio cdev can be externally opened by a > management layer. In such a case the fd is passed: > > -object iommufd,id=iommufd0,fd=22 > -device vfio-pci,iommufd=iommufd0,fd=23 > > If the fd parameter is not passed, the fd is opened by QEMU. > See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html > for detailed discuss on this requirement. > > If no iommufd option is passed to the vfio-pci device, iommufd is not > used and the end-user gets the behavior based on the legacy vfio iommu > interfaces: > > -device vfio-pci,host=0000:02:00.0 > > While the legacy kernel interface is group-centric, the new iommufd > interface is device-centric, relying on device fd and iommufd. > > To support both interfaces in the QEMU VFIO device we reworked the vfio > container abstraction so that the generic VFIO code can use either > backend. > > The VFIOContainer object becomes a base object derived into > a) the legacy VFIO container and > b) the new iommufd based container. > > The base object implements generic code such as code related to > memory_listener and address space management whereas the derived > objects implement callbacks specific to either BE, legacy and > iommufd. Indeed each backend has its own way to setup secure context > and dma management interface. The below diagram shows how it looks > like with both BEs. > > VFIO AddressSpace/Memory > +-------+ +----------+ +-----+ +-----+ > | pci | | platform | | ap | | ccw | > +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ > | | | | | AddressSpace | > | | | | +------------+---------+ > +---V-----------V-----------V--------V----+ / > | VFIOAddressSpace | <------------+ > | | | MemoryListener > | VFIOContainer list | > +-------+----------------------------+----+ > | | > | | > +-------V------+ +--------V----------+ > | iommufd | | vfio legacy | > | container | | container | > +-------+------+ +--------+----------+ > | | > | /dev/iommu | /dev/vfio/vfio > | /dev/vfio/devices/vfioX | /dev/vfio/$group_id > Userspace | | > ============+============================+=========================== > Kernel | device fd | > +---------------+ | group/container fd > | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) > | ATTACH_IOAS) | | device fd > | | | > | +-------V------------V-----------------+ > iommufd | | vfio | > (map/unmap | +---------+--------------------+-------+ > ioas_copy) | | | map/unmap > | | | > +------V------+ +-----V------+ +------V--------+ > | iommfd core | | device | | vfio iommu | > +-------------+ +------------+ +---------------+ > > [Secure Context setup] > - iommufd BE: uses device fd and iommufd to setup secure context > (bind_iommufd, attach_ioas) > - vfio legacy BE: uses group fd and container fd to setup secure context > (set_container, set_iommu) > [Device access] > - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX > - vfio legacy BE: device fd is retrieved from group fd ioctl > [DMA Mapping flow] > 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener > 2. VFIO populates DMA map/unmap via the container BEs > *) iommufd BE: uses iommufd > *) vfio legacy BE: uses container fd > > > Changelog: > v6: > - simplify CONFIG_IOMMUFD checking code further (Cédric) > - check iommufd_cdev_kvm_device_add return value (Cédric) > - dirrectory -> directory (Cédric) > - propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric) > - introduce a helper vfio_device_set_fd (Cédric) > - Move #include "sysemu/iommufd.h" in platform.c (Cédric) > - simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas > - Dare to keep Matthew's RB as related change is minor > > v5: > - Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric) > - Add (uintptr_t) to info->allowed_iovas (Cédric) > - Switch to IOAS attach/detach and hide hwpt (Jason) > - move chardev_open.[h|c] under the IOMMUFD entry (Cédric) > - Move vfio_legacy_pci_hot_reset into container.c (Cédric) > - Add missed pgsizes initialization in vfio_get_info_iova_range > - split linking iommufd patch into three to be cleaner > - Fix comments on PCI BAR unmap > > v4: > - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus) > - add doc for default case without fd (Markus) > - Fix build issue reported by Markus and Cédric > - Simply use SPDX identifier in new file (Cédric) > - make vfio_container_init/destroy helper a seperate patch (Cédric) > - make vrdl_list movement a seperate patch (Cédric) > - add const for some callback parameters (Cédric) > - add g_assert in VFIOIOMMUOps callback (Cédric) > - introduce pci_hot_reset callback (Cédric) > - remove VFIOIOMMUSpaprOps (Cédric) > - initialize g_autofree to NULL (Cédric) > - adjust func name prefix and trace event in iommufd.c (Cédric) > - add RB > > v3: > - Rename base container as VFIOContainerBase and legacy container as > container (Cédric) > - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric) > - Cleanup container.c by introducing spapr backend and move spapr code out > (Cédric) > - Introduce vfio_iommu_spapr_ops (Cédric) > - Add doc of iommufd in qom.json and have iommufd member sorted (Markus) > - patch19 and patch21 are splitted to two parts to facilitate review > > v2: > - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric > - add fd passing to platform/ap/ccw vfio device > - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric > - rename char_dev.h to chardev_open.h for same naming scheme per Daniel > - add full copyright per Daniel and Jason > > > Note changelog below are from full IOMMUFD series: > > v1: > - Alloc hwpt instead of using auto hwpt > - elaborate iommufd code per Nicolin > - consolidate two patches and drop as.c > - typo error fix and function rename > > rfcv4: > - rebase on top of v8.0.3 > - Add one patch from Yi which is about vfio device add in kvm > - Remove IOAS_COPY optimization and focus on functions in this patchset > - Fix wrong name issue reported and fix suggested by Matthew > - Fix compilation issue reported and fix sugggsted by Nicolin > - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better > granularity > - Add dev_iter_next() callback to avoid adding so many callback > at container scope, add VFIODevice.hwpt to support that > - Restore all functions back to common from container whenever possible, > mainly migration and reset related functions > - Add --enable/disable-iommufd config option, enabled by default in linux > - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next > - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device > - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove > redundant code > - Add FD passing support for vfio device backed by IOMMUFD > - Fix hot unplug resource leak issue in vfio_legacy_detach_device() > - Fix FD leak in vfio_get_devicefd() > > rfcv3: > - rebase on top of v7.2.0 > - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for > VFIO backends > - Fix use after free in error path, reported by Alister > - Split common.c in several steps to ease the review > > rfcv2: > - remove the first three patches of rfcv1 > - add open cdev helper suggested by Jason > - remove the QOMification of the VFIOContainer and simply use standard ops > (David) > - add "-object iommufd" suggested by Alex > > Thanks > Zhenzhong > > > Cédric Le Goater (3): > hw/arm: Activate IOMMUFD for virt machines > kconfig: Activate IOMMUFD for s390x machines > hw/i386: Activate IOMMUFD for q35 machines > > Eric Auger (2): > backends/iommufd: Introduce the iommufd object > vfio/pci: Allow the selection of a given iommu backend > > Yi Liu (2): > util/char_dev: Add open_cdev() > vfio/iommufd: Implement the iommufd backend > > Zhenzhong Duan (14): > vfio/common: return early if space isn't empty > vfio/iommufd: Relax assert check for iommufd backend > vfio/iommufd: Add support for iova_ranges and pgsizes > vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info > vfio/pci: Introduce a vfio pci hot reset interface > vfio/iommufd: Enable pci hot reset through iommufd cdev interface > vfio/pci: Make vfio cdev pre-openable by passing a file handle > vfio/platform: Allow the selection of a given iommu backend > vfio/platform: Make vfio cdev pre-openable by passing a file handle > vfio/ap: Allow the selection of a given iommu backend > vfio/ap: Make vfio cdev pre-openable by passing a file handle > vfio/ccw: Allow the selection of a given iommu backend > vfio/ccw: Make vfio cdev pre-openable by passing a file handle > vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps > callbacks > > MAINTAINERS | 10 + > qapi/qom.json | 19 + > hw/vfio/pci.h | 6 + > include/hw/vfio/vfio-common.h | 26 +- > include/hw/vfio/vfio-container-base.h | 15 +- > include/qemu/chardev_open.h | 16 + > include/sysemu/iommufd.h | 44 ++ > backends/iommufd.c | 228 ++++++++++ > hw/vfio/ap.c | 29 +- > hw/vfio/ccw.c | 31 +- > hw/vfio/common.c | 24 +- > hw/vfio/container-base.c | 6 +- > hw/vfio/container.c | 208 ++++++++- > hw/vfio/helpers.c | 44 ++ > hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++ > hw/vfio/pci.c | 212 ++------- > hw/vfio/platform.c | 38 +- > util/chardev_open.c | 81 ++++ > backends/Kconfig | 4 + > backends/meson.build | 1 + > backends/trace-events | 10 + > hw/arm/Kconfig | 1 + > hw/i386/Kconfig | 1 + > hw/s390x/Kconfig | 1 + > hw/vfio/meson.build | 3 + > hw/vfio/trace-events | 11 + > qemu-options.hx | 12 + > util/meson.build | 1 + > 28 files changed, 1493 insertions(+), 219 deletions(-) > create mode 100644 include/qemu/chardev_open.h > create mode 100644 include/sysemu/iommufd.h > create mode 100644 backends/iommufd.c > create mode 100644 hw/vfio/iommufd.c > create mode 100644 util/chardev_open.c >