While running 3D apps inside a Guest VM with a SRIOV enabled dGPU, it was noticed that migrating a BO to System RAM before exporting it as a dmabuf results in considerable performance degradation. For example, running a simple 3D app such as weston-simple-egl would yield ~50 FPS instead of ~59 FPS, assuming a mode of 1920x1080@60.
So, fix this issue by not migrating the BO and keep it in LMEM during export. However, since the GPU running in PF mode on the Host cannot effectively access the PCI BAR addresses backing the imported dmabuf BO, they need to be translated into LMEM addresses (DPAs) to enable this use-case to work properly. With this patch series applied, it would become possible to display (via Qemu GTK UI) Guest VM compositor's framebuffer (created in its LMEM) on the Host without having to make any copies of it or a costly roundtrip to System RAM. And, weston-simple-egl can now achieve ~59 FPS while running with Gnome Wayland in the Guest VM. Changelog: v2 -> v3: - Rebased (and tested) on kernel 6.17.0-rc4 with B60 - Updated the commit message in the P2PDMA patch and other patches v1 -> v2: - Use a dma_addr array instead of SG table to store translated DMA addresses (Matt) - Use a cursor to iterate over the entries in the dma_addr array instead of relying on SG iterator (Matt) - Rebased and tested this series on top of the one that introduces drm_pagemap_dma_addr and xe_res_first_dma/__xe_res_dma_next that this version relies on Patchset overview: Patch 1: PCI driver patch to unblock P2P DMA between VF and PF Patch 2: Prevent BO migration to System RAM while running in VM Patch 3: Helper function to get VF's backing object in LMEM Patch 4-5: Create and use a new dma_addr array for LMEM based dmabuf BOs to store translated addresses (DPAs) Associated Qemu patch series: https://lore.kernel.org/qemu-devel/20250903054438.1179384-1-vivek.kasire...@intel.com/ Associated vfio-pci patch series: https://lore.kernel.org/linux-mm/cover.1754311439.git.l...@kernel.org/ This series is tested using the following method: - Run Qemu with the following relevant options: qemu-system-x86_64 -m 4096m .... -device vfio-pci,host=0000:03:00.1 -device virtio-vga,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,id=mem1,size=4096M -machine memory-backend=mem1 ... - Run Gnome Wayland with the following options in the Guest VM: # cat /usr/lib/udev/rules.d/61-mutter-primary-gpu.rules ENV{DEVNAME}=="/dev/dri/card1", TAG+="mutter-device-preferred-primary", TAG+="mutter-device-disable-kms-modifiers" # XDG_SESSION_TYPE=wayland dbus-run-session -- /usr/bin/gnome-shell --wayland --no-x11 & Cc: Lucas De Marchi <lucas.demar...@intel.com> Cc: Thomas Hellström <thomas.hellst...@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.v...@intel.com> Cc: Michal Wajdeczko <michal.wajdec...@intel.com> Cc: Matthew Brost <matthew.br...@intel.com> Cc: Matthew Auld <matthew.a...@intel.com> Cc: Dongwon Kim <dongwon....@intel.com> Vivek Kasireddy (5): PCI/P2PDMA: Don't enforce ACS check for device functions of Intel GPUs drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode drm/xe/pf: Add a helper function to get a VF's backing object in LMEM drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs drm/xe/pt: Add an additional check for dmabuf BOs while doing bind drivers/gpu/drm/xe/xe_bo.c | 98 +++++++++++++++++++++- drivers/gpu/drm/xe/xe_bo_types.h | 12 +++ drivers/gpu/drm/xe/xe_dma_buf.c | 9 +- drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 23 +++++ drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h | 1 + drivers/gpu/drm/xe/xe_pt.c | 7 ++ drivers/pci/p2pdma.c | 18 +++- 7 files changed, 163 insertions(+), 5 deletions(-) -- 2.50.1