On 3/12/26 19:45, Matt Evans wrote: > Hi all, > > > There were various suggestions in the September 2025 thread "[TECH > TOPIC] vfio, iommufd: Enabling user space drivers to vend more > granular access to client processes" [0], and LPC discussions, around > improving the situation for multi-process userspace driver designs. > This RFC series implements some of these ideas. > > (Thanks for feedback on v1! Revised series, with changes noted > inline.) > > Background: Multi-process USDs > ============================== > > The userspace driver scenario discussed in that thread involves a > primary process driving a PCIe function through VFIO/iommufd, which > manages the function-wide ownership/lifecycle. The function is > designed to provide multiple distinct programming interfaces (for > example, several independent MMIO register frames in one function), > and the primary process delegates control of these interfaces to > multiple independent client processes (which do the actual work). > This scenario clearly relies on a HW design that provides appropriate > isolation between the programming interfaces. > > The two key needs are: > > 1. Mechanisms to safely delegate a subset of the device MMIO > resources to a client process without over-sharing wider access > (or influence over whole-device activities, such as reset). > > 2. Mechanisms to allow a client process to do its own iommufd > management w.r.t. its address space, in a way that's isolated > from DMA relating to other clients. > > > mmap() of VFIO DMABUFs > ====================== > > This RFC addresses #1 in "vfio/pci: Support mmap() of a VFIO DMABUF", > implementing the proposals in [0] to add mmap() support to the > existing VFIO DMABUF exporter. > > This enables a userspace driver to define DMABUF ranges corresponding > to sub-ranges of a BAR, and grant a given client (via a shared fd) > the capability to access (only) those sub-ranges. The VFIO device fds > would be kept private to the primary process. All the client can do > with that fd is map (or iomap via iommufd) that specific subset of > resources, and the impact of bugs/malice is contained. > > (We'll follow up on #2 separately, as a related-but-distinct problem. > PASIDs are one way to achieve per-client isolation of DMA; another > could be sharing of a single IOVA space via 'constrained' iommufds.) > > > New in v2: To achieve this, the existing VFIO BAR mmap() path is > converted to use DMABUFs behind the scenes, in "vfio/pci: Convert BAR > mmap() to use a DMABUF" plus new helper functions, as Jason/Christian > suggested in the v1 discussion [3]. > > This means: > > - Both regular and new DMABUF BAR mappings share the same vm_ops, > i.e. mmap()ing DMABUFs is a smaller change on top of the existing > mmap(). > > - The zapping of mappings occurs via vfio_pci_dma_buf_move(), and the > vfio_pci_zap_bars() originally paired with the _move()s can go > away. Each DMABUF has a unique address_space. > > - It's a step towards future iommufd VFIO Type1 emulation > implementing P2P, since iommufd can now get a DMABUF from a VA that > it's mapping for IO; the VMAs' vm_file is that of the backing > DMABUF. > > > Revocation/reclaim > ================== > > Mapping a BAR subset is useful, but the lifetime of access granted to > a client needs to be managed well. For example, a protocol between > the primary process and the client can indicate when the client is > done, and when it's safe to reuse the resources elsewhere, but cleanup > can't practically be cooperative. > > For robustness, we enable the driver to make the resources > guaranteed-inaccessible when it chooses, so that it can re-assign them > to other uses in future. > > "vfio/pci: Permanently revoke a DMABUF on request" adds a new VFIO > device fd ioctl, VFIO_DEVICE_PCI_DMABUF_REVOKE. This takes a DMABUF > fd parameter previously exported (from that device!) and permanently > revokes the DMABUF. This notifies/detaches importers, zaps PTEs for > any mappings, and guarantees no future attachment/import/map/access is > possible by any means. > > A primary driver process would use this operation when the client's > tenure ends to reclaim "loaned-out" MMIO interfaces, at which point > the interfaces could be safely re-used. > > New in v2: ioctl() on VFIO driver fd, rather than DMABUF fd. A DMABUF > is revoked using code common to vfio_pci_dma_buf_move(), selectively > zapping mappings (after waiting for completion on the > dma_buf_invalidate_mappings() request). > > > BAR mapping access attributes > ============================= > > Inspired by Alex [Mastro] and Jason's comments in [0] and Mahmoud's > work in [1] with the goal of controlling CPU access attributes for > VFIO BAR mappings (e.g. WC), we can decorate DMABUFs with access > attributes that are then used by a mapping's PTEs. > > I've proposed reserving a field in struct > vfio_device_feature_dma_buf's flags to specify an attribute for its > ranges. Although that keeps the (UAPI) struct unchanged, it means all > ranges in a DMABUF share the same attribute. I feel a single > attribute-to-mmap() relation is logical/reasonable. An application > can also create multiple DMABUFs to describe any BAR layout and mix of > attributes. > > > Tests > ===== > > (Still sharing the [RFC ONLY] userspace test/demo program for context, > not for merge.) > > It illustrates & tests various map/revoke cases, but doesn't use the > existing VFIO selftests and relies on a (tweaked) QEMU EDU function. > I'm (still) working on integrating the scenarios into the existing > VFIO selftests. > > This code has been tested in mapping DMABUFs of single/multiple > ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff > > 0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem > to work correctly. I've lightly tested WC mappings also (by observing > resulting PTEs as having the correct attributes...). > > > Fin > === > > v2 is based on next-20260310 (to build on Leon's recent series > "vfio: Wait for dma-buf invalidation to complete" [2]). > > > Please share your thoughts! I'd like to de-RFC if we feel this > approach is now fair.
I only skimmed over it, but at least of hand I couldn't find anything fundamentally wrong. The locking order seems to change in patch #6. In general I strongly recommend to enable lockdep while testing anyway but explicitly when I see such changes. Additional to that it might also be a good idea to have a lockdep initcall function which defines the locking order in the way all the VFIO code should follow. See function dma_resv_lockdep() for an example on how to do that. Especially with mmap support and all the locks involved with that it has proven to be a good practice to have something like that. Regards, Christian. > > > Many thanks, > > > Matt > > > > References: > > [0]: > https://lore.kernel.org/linux-iommu/[email protected]/ > [1]: https://lore.kernel.org/all/[email protected]/ > [2]: > https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566ad@houat/T/#m310cd07011e3a1461b6fda45e3f9b886ba76571a > [3]: https://lore.kernel.org/all/[email protected]/ > > -------------------------------------------------------------------------------- > Changelog: > > v2: Respin based on the feedback/suggestions: > > - Transform the existing VFIO BAR mmap path to also use DMABUFs behind > the scenes, and then simply share that code for explicitly-mapped > DMABUFs. > > - Refactors the export itself out of vfio_pci_core_feature_dma_buf, > and shared by a new vfio_pci_core_mmap_prep_dmabuf helper used by > the regular VFIO mmap to create a DMABUF. > > - Revoke buffers using a VFIO device fd ioctl > > v1: https://lore.kernel.org/all/[email protected]/ > > > Matt Evans (10): > vfio/pci: Set up VFIO barmap before creating a DMABUF > vfio/pci: Clean up DMABUFs before disabling function > vfio/pci: Add helper to look up PFNs for DMABUFs > vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA > vfio/pci: Convert BAR mmap() to use a DMABUF > vfio/pci: Remove vfio_pci_zap_bars() > vfio/pci: Support mmap() of a VFIO DMABUF > vfio/pci: Permanently revoke a DMABUF on request > vfio/pci: Add mmap() attributes to DMABUF feature > [RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test > > drivers/vfio/pci/Kconfig | 3 +- > drivers/vfio/pci/Makefile | 3 +- > drivers/vfio/pci/vfio_pci_config.c | 18 +- > drivers/vfio/pci/vfio_pci_core.c | 123 +-- > drivers/vfio/pci/vfio_pci_dmabuf.c | 425 +++++++-- > drivers/vfio/pci/vfio_pci_priv.h | 46 +- > include/uapi/linux/vfio.h | 42 +- > tools/testing/selftests/vfio/Makefile | 1 + > .../vfio/standalone/vfio_dmabuf_mmap_test.c | 837 ++++++++++++++++++ > 9 files changed, 1339 insertions(+), 159 deletions(-) > create mode 100644 > tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c >
