Hi Zhenzhong,

On 11/14/23 11:09, Zhenzhong Duan wrote:
> Hi,
>
> Thanks all for giving guides and comments on previous series, this is
> the remaining part of the iommufd support.
>
> Based on Cédric's suggestion, replace old config method for IOMMUFD
> with Kconfig.
>
> Based on Jason's suggestion, drop the implementation of manually
> allocating hwpt and switch to IOAS attach/detach.
>
> Beside current test, we also tested mdev with mtty for better cover range.
>
> PATCH 1: Introduce iommufd object
> PATCH 2-9: add IOMMUFD container and cdev support
> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
> PATCH 18: make VFIOContainerBase parameter const
> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
>
>
> We have done wide test with different combinations, e.g:
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
> - test mdev pass through with mtty and mix with real device and different BE
>
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync
> - and etc.

Feel free to add my T-b:
Tested-by: Eric Auger <eric.au...@redhat.com>

Thanks

Eric
>
>
> qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6
> Based on vfio-next, commit id: 1a22fb936e
>
> --------------------------------------------------------------------------
>
> Below are some background and graph about the design:
>
> With the introduction of iommufd, the Linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>
> At QEMU level, interactions with the /dev/iommu are abstracted by a new
> iommufd object (compiled in with the CONFIG_IOMMUFD option).
>
> Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
> linked with an iommufd object. In this series, the vfio-pci device is
> granted with such capability (other VFIO devices are not yet ready):
>
> It gets a new optional parameter named iommufd which allows to pass
> an iommufd object:
>
>     -object iommufd,id=iommufd0
>     -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>
> Note the /dev/iommu and vfio cdev can be externally opened by a
> management layer. In such a case the fd is passed:
>
>     -object iommufd,id=iommufd0,fd=22
>     -device vfio-pci,iommufd=iommufd0,fd=23
>
> If the fd parameter is not passed, the fd is opened by QEMU.
> See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
> for detailed discuss on this requirement.
>
> If no iommufd option is passed to the vfio-pci device, iommufd is not
> used and the end-user gets the behavior based on the legacy vfio iommu
> interfaces:
>
>     -device vfio-pci,host=0000:02:00.0
>
> While the legacy kernel interface is group-centric, the new iommufd
> interface is device-centric, relying on device fd and iommufd.
>
> To support both interfaces in the QEMU VFIO device we reworked the vfio
> container abstraction so that the generic VFIO code can use either
> backend.
>
> The VFIOContainer object becomes a base object derived into
> a) the legacy VFIO container and
> b) the new iommufd based container.
>
> The base object implements generic code such as code related to
> memory_listener and address space management whereas the derived
> objects implement callbacks specific to either BE, legacy and
> iommufd. Indeed each backend has its own way to setup secure context
> and dma management interface. The below diagram shows how it looks
> like with both BEs.
>
>                     VFIO                           AddressSpace/Memory
>     +-------+  +----------+  +-----+  +-----+
>     |  pci  |  | platform |  |  ap |  | ccw |
>     +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
>         |           |           |        |        |   AddressSpace       |
>         |           |           |        |        +------------+---------+
>     +---V-----------V-----------V--------V----+               /
>     |           VFIOAddressSpace              | <------------+
>     |                  |                      |  MemoryListener
>     |          VFIOContainer list             |
>     +-------+----------------------------+----+
>             |                            |
>             |                            |
>     +-------V------+            +--------V----------+
>     |   iommufd    |            |    vfio legacy    |
>     |  container   |            |     container     |
>     +-------+------+            +--------+----------+
>             |                            |
>             | /dev/iommu                 | /dev/vfio/vfio
>             | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
> Userspace   |                            |
> ============+============================+===========================
> Kernel      |  device fd                 |
>             +---------------+            | group/container fd
>             | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
>             |  ATTACH_IOAS) |            | device fd
>             |               |            |
>             |       +-------V------------V-----------------+
>     iommufd |       |                vfio                  |
> (map/unmap  |       +---------+--------------------+-------+
> ioas_copy)  |                 |                    | map/unmap
>             |                 |                    |
>      +------V------+    +-----V------+      +------V--------+
>      | iommfd core |    |  device    |      |  vfio iommu   |
>      +-------------+    +------------+      +---------------+
>
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
>               (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
>                   (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> 2. VFIO populates DMA map/unmap via the container BEs
>    *) iommufd BE: uses iommufd
>    *) vfio legacy BE: uses container fd
>
>
> Changelog:
> v6:
> - simplify CONFIG_IOMMUFD checking code further (Cédric)
> - check iommufd_cdev_kvm_device_add return value (Cédric)
> - dirrectory -> directory (Cédric)
> - propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric)
> - introduce a helper vfio_device_set_fd (Cédric)
> - Move #include "sysemu/iommufd.h" in platform.c (Cédric)
> - simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas
> - Dare to keep Matthew's RB as related change is minor
>
> v5:
> - Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric)
> - Add (uintptr_t) to info->allowed_iovas (Cédric)
> - Switch to IOAS attach/detach and hide hwpt (Jason)
> - move chardev_open.[h|c] under the IOMMUFD entry (Cédric)
> - Move vfio_legacy_pci_hot_reset into container.c (Cédric)
> - Add missed pgsizes initialization in vfio_get_info_iova_range
> - split linking iommufd patch into three to be cleaner
> - Fix comments on PCI BAR unmap
>
> v4:
> - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
> - add doc for default case without fd (Markus)
> - Fix build issue reported by Markus and Cédric
> - Simply use SPDX identifier in new file (Cédric)
> - make vfio_container_init/destroy helper a seperate patch (Cédric)
> - make vrdl_list movement a seperate patch (Cédric)
> - add const for some callback parameters (Cédric)
> - add g_assert in VFIOIOMMUOps callback (Cédric)
> - introduce pci_hot_reset callback (Cédric)
> - remove VFIOIOMMUSpaprOps (Cédric)
> - initialize g_autofree to NULL (Cédric)
> - adjust func name prefix and trace event in iommufd.c (Cédric)
> - add RB
>
> v3:
> - Rename base container as VFIOContainerBase and legacy container as 
> container (Cédric)
> - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
> - Cleanup container.c by introducing spapr backend and move spapr code out 
> (Cédric)
> - Introduce vfio_iommu_spapr_ops (Cédric)
> - Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
> - patch19 and patch21 are splitted to two parts to facilitate review
>
> v2:
> - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
> - add fd passing to platform/ap/ccw vfio device
> - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
> - rename char_dev.h to chardev_open.h for same naming scheme per Daniel
> - add full copyright per Daniel and Jason
>
>
> Note changelog below are from full IOMMUFD series:
>
> v1:
> - Alloc hwpt instead of using auto hwpt
> - elaborate iommufd code per Nicolin
> - consolidate two patches and drop as.c
> - typo error fix and function rename
>
> rfcv4:
> - rebase on top of v8.0.3
> - Add one patch from Yi which is about vfio device add in kvm
> - Remove IOAS_COPY optimization and focus on functions in this patchset
> - Fix wrong name issue reported and fix suggested by Matthew
> - Fix compilation issue reported and fix sugggsted by Nicolin
> - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
> granularity
> - Add dev_iter_next() callback to avoid adding so many callback
>   at container scope, add VFIODevice.hwpt to support that
> - Restore all functions back to common from container whenever possible,
>   mainly migration and reset related functions
> - Add --enable/disable-iommufd config option, enabled by default in linux
> - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
> - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
> - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
> redundant code
> - Add FD passing support for vfio device backed by IOMMUFD
> - Fix hot unplug resource leak issue in vfio_legacy_detach_device()
> - Fix FD leak in vfio_get_devicefd()
>
> rfcv3:
> - rebase on top of v7.2.0
> - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
>   VFIO backends
> - Fix use after free in error path, reported by Alister
> - Split common.c in several steps to ease the review
>
> rfcv2:
> - remove the first three patches of rfcv1
> - add open cdev helper suggested by Jason
> - remove the QOMification of the VFIOContainer and simply use standard ops
> (David)
> - add "-object iommufd" suggested by Alex
>
> Thanks
> Zhenzhong
>
>
> Cédric Le Goater (3):
>   hw/arm: Activate IOMMUFD for virt machines
>   kconfig: Activate IOMMUFD for s390x machines
>   hw/i386: Activate IOMMUFD for q35 machines
>
> Eric Auger (2):
>   backends/iommufd: Introduce the iommufd object
>   vfio/pci: Allow the selection of a given iommu backend
>
> Yi Liu (2):
>   util/char_dev: Add open_cdev()
>   vfio/iommufd: Implement the iommufd backend
>
> Zhenzhong Duan (14):
>   vfio/common: return early if space isn't empty
>   vfio/iommufd: Relax assert check for iommufd backend
>   vfio/iommufd: Add support for iova_ranges and pgsizes
>   vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
>   vfio/pci: Introduce a vfio pci hot reset interface
>   vfio/iommufd: Enable pci hot reset through iommufd cdev interface
>   vfio/pci: Make vfio cdev pre-openable by passing a file handle
>   vfio/platform: Allow the selection of a given iommu backend
>   vfio/platform: Make vfio cdev pre-openable by passing a file handle
>   vfio/ap: Allow the selection of a given iommu backend
>   vfio/ap: Make vfio cdev pre-openable by passing a file handle
>   vfio/ccw: Allow the selection of a given iommu backend
>   vfio/ccw: Make vfio cdev pre-openable by passing a file handle
>   vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
>     callbacks
>
>  MAINTAINERS                           |  10 +
>  qapi/qom.json                         |  19 +
>  hw/vfio/pci.h                         |   6 +
>  include/hw/vfio/vfio-common.h         |  26 +-
>  include/hw/vfio/vfio-container-base.h |  15 +-
>  include/qemu/chardev_open.h           |  16 +
>  include/sysemu/iommufd.h              |  44 ++
>  backends/iommufd.c                    | 228 ++++++++++
>  hw/vfio/ap.c                          |  29 +-
>  hw/vfio/ccw.c                         |  31 +-
>  hw/vfio/common.c                      |  24 +-
>  hw/vfio/container-base.c              |   6 +-
>  hw/vfio/container.c                   | 208 ++++++++-
>  hw/vfio/helpers.c                     |  44 ++
>  hw/vfio/iommufd.c                     | 630 ++++++++++++++++++++++++++
>  hw/vfio/pci.c                         | 212 ++-------
>  hw/vfio/platform.c                    |  38 +-
>  util/chardev_open.c                   |  81 ++++
>  backends/Kconfig                      |   4 +
>  backends/meson.build                  |   1 +
>  backends/trace-events                 |  10 +
>  hw/arm/Kconfig                        |   1 +
>  hw/i386/Kconfig                       |   1 +
>  hw/s390x/Kconfig                      |   1 +
>  hw/vfio/meson.build                   |   3 +
>  hw/vfio/trace-events                  |  11 +
>  qemu-options.hx                       |  12 +
>  util/meson.build                      |   1 +
>  28 files changed, 1493 insertions(+), 219 deletions(-)
>  create mode 100644 include/qemu/chardev_open.h
>  create mode 100644 include/sysemu/iommufd.h
>  create mode 100644 backends/iommufd.c
>  create mode 100644 hw/vfio/iommufd.c
>  create mode 100644 util/chardev_open.c
>


Reply via email to