> -----Original Message-----
> From: Yi Liu [mailto:yi.l....@intel.com]
> Sent: 14 April 2022 11:47
> To: alex.william...@redhat.com; coh...@redhat.com;
> qemu-devel@nongnu.org
> Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> j...@nvidia.com; nicol...@nvidia.com; eric.au...@redhat.com;
> eric.auger....@gmail.com; kevin.t...@intel.com; yi.l....@intel.com;
> chao.p.p...@intel.com; yi.y....@intel.com; pet...@redhat.com
> Subject: [RFC 00/18] vfio: Adopt iommufd
>
> With the introduction of iommufd[1], the linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> Other devices like vpda, vfio mdev and etc. are not considered yet.
>
> For vfio devices, the new interface is tied with device fd and iommufd
> as the iommufd solution is device-centric. This is different from legacy
> vfio which is group-centric. To support both interfaces in QEMU, this
> series introduces the iommu backend concept in the form of different
> container classes. The existing vfio container is named legacy container
> (equivalent with legacy iommu backend in this series), while the new
> iommufd based container is named as iommufd container (may also be
> mentioned
> as iommufd backend in this series). The two backend types have their own
> way to setup secure context and dma management interface. Below diagram
> shows how it looks like with both BEs.
>
> VFIO
> AddressSpace/Memory
> +-------+ +----------+ +-----+ +-----+
> | pci | | platform | | ap | | ccw |
> +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
> | | | | | AddressSpace
> |
> | | | | +------------+---------+
> +---V-----------V-----------V--------V----+ /
> | VFIOAddressSpace | <------------+
> | | | MemoryListener
> | VFIOContainer list |
> +-------+----------------------------+----+
> | |
> | |
> +-------V------+ +--------V----------+
> | iommufd | | vfio legacy |
> | container | | container |
> +-------+------+ +--------+----------+
> | |
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX | /dev/vfio/$group_id
> Userspace | |
>
> ===========+============================+==========================
> ======
> Kernel | device fd |
> +---------------+ | group/container fd
> | (BIND_IOMMUFD | |
> (SET_CONTAINER/SET_IOMMU)
> | ATTACH_IOAS) | | device fd
> | | |
> | +-------V------------V-----------------+
> iommufd | | vfio |
> (map/unmap | +---------+--------------------+-------+
> ioas_copy) | | | map/unmap
> | | |
> +------V------+ +-----V------+ +------V--------+
> | iommfd core | | device | | vfio iommu |
> +-------------+ +------------+ +---------------+
>
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
> (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
> (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> - VFIO populates DMA map/unmap via the container BEs
> *) iommufd BE: uses iommufd
> *) vfio legacy BE: uses container fd
>
> This series qomifies the VFIOContainer object which acts as a base class
> for a container. This base class is derived into the legacy VFIO container
> and the new iommufd based container. The base class implements generic
> code
> such as code related to memory_listener and address space management
> whereas
> the derived class implements callbacks that depend on the kernel user space
> being used.
>
> The selection of the backend is made on a device basis using the new
> iommufd option (on/off/auto). By default the iommufd backend is selected
> if supported by the host and by QEMU (iommufd KConfig). This option is
> currently available only for the vfio-pci device. For other types of
> devices, it does not yet exist and the legacy BE is chosen by default.
>
> Test done:
> - PCI and Platform device were tested
> - ccw and ap were only compile-tested
> - limited device hotplug test
> - vIOMMU test run for both legacy and iommufd backends (limited tests)
>
> This series was co-developed by Eric Auger and me based on the exploration
> iommufd kernel[2], complete code of this series is available in[3]. As
> iommufd kernel is in the early step (only iommufd generic interface is in
> mailing list), so this series hasn't made the iommufd backend fully on par
> with legacy backend w.r.t. features like p2p mappings, coherency tracking,
> live migration, etc. This series hasn't supported PCI devices without FLR
> neither as the kernel doesn't support VFIO_DEVICE_PCI_HOT_RESET when
> userspace
> is using iommufd. The kernel needs to be updated to accept device fd list for
> reset when userspace is using iommufd. Related work is in progress by
> Jason[4].
>
> TODOs:
> - Add DMA alias check for iommufd BE (group level)
> - Make pci.c to be BE agnostic. Needs kernel change as well to fix the
> VFIO_DEVICE_PCI_HOT_RESET gap
> - Cleanup the VFIODevice fields as it's used in both BEs
> - Add locks
> - Replace list with g_tree
> - More tests
>
> Patch Overview:
>
> - Preparation:
> 0001-scripts-update-linux-headers-Add-iommufd.h.patch
> 0002-linux-headers-Import-latest-vfio.h-and-iommufd.h.patch
> 0003-hw-vfio-pci-fix-vfio_pci_hot_reset_result-trace-poin.patch
> 0004-vfio-pci-Use-vbasedev-local-variable-in-vfio_realize.patch
>
> 0005-vfio-common-Rename-VFIOGuestIOMMU-iommu-into-iommu_m.patch
> 0006-vfio-common-Split-common.c-into-common.c-container.c.patch
>
> - Introduce container object and covert existing vfio to use it:
> 0007-vfio-Add-base-object-for-VFIOContainer.patch
> 0008-vfio-container-Introduce-vfio_attach-detach_device.patch
> 0009-vfio-platform-Use-vfio_-attach-detach-_device.patch
> 0010-vfio-ap-Use-vfio_-attach-detach-_device.patch
> 0011-vfio-ccw-Use-vfio_-attach-detach-_device.patch
> 0012-vfio-container-obj-Introduce-attach-detach-_device-c.patch
> 0013-vfio-container-obj-Introduce-VFIOContainer-reset-cal.patch
>
> - Introduce iommufd based container:
> 0014-hw-iommufd-Creation.patch
> 0015-vfio-iommufd-Implement-iommufd-backend.patch
> 0016-vfio-iommufd-Add-IOAS_COPY_DMA-support.patch
>
> - Add backend selection for vfio-pci:
> 0017-vfio-as-Allow-the-selection-of-a-given-iommu-backend.patch
> 0018-vfio-pci-Add-an-iommufd-option.patch
>
> [1]
> https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com
> /
> [2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
> [3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1
Hi,
I had a go with the above branches on our ARM64 platform trying to pass-through
a VF dev, but Qemu reports an error as below,
[ 0.444728] hisi_sec2 0000:00:01.0: enabling device (0000 -> 0002)
qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
qemu-system-aarch64-iommufd: vfio_container_dma_map(0xaaaafeb40ce0,
0x8000000000, 0x10000, 0xffffb40ef000) = -14 (Bad address)
I think this happens for the dev BAR addr range. I haven't debugged the kernel
yet to see where it actually reports that.
Maybe I am missing something. Please let me know.
Thanks,
Shameer