Patches submitted to kernel-team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2024-December/155853.html.

SRU Justification

[Impact]

The patch "vfio/pci: Use unmap_mapping_range()" rewrote the way VFIO tracks
mapped regions to use the "vmf_insert_pfn" function instead of tracking them
itself and using "io_remap_pfn_range". The implementation using
"vmf_insert_pfn" is significantly slower. To mitigate this slowdown, "vfio/pci:
Insert full vma on mmap'd MMIO fault" was introduced to prefault the entirety
of areas mapped by vfio_pci, resulting in soft lockup warnings on the host for
large BAR region devices. Reverting this prefaulting behavior does not fully
resolve the slowness, as a VM still experiences extremely slow accesses to the
passthrough devices as VMAs get faulted in, causing soft lockup warnings in the
guest during boot. Thus, "vfio/pci: Use unmap_mapping_range()" must also be
reverted to restore performance to that of versions prior to 6.8.0-48-generic.

[Fix]

Both of these performance issues are resolved upstream by patchset [1], but
this would be a complex backport to 6.8 and 6.11, with significant changes to
core parts of the kernel.

Reverting the following commits resolves the issue, with a much reduced
potential for regression:
- "mm: use rwsem assertion macros for mmap_lock" (revert needed in Oracular,
  not present in Noble)
- "vfio/pci: Insert full vma on mmap'd MMIO fault"
- "vfio/pci: Use unmap_mapping_range()"

[Test Plan]

Tested on a DGX H100 system, verified to reduce VM start time with 8
passthrough H100 GPUs from 45 minutes back down to 5 minutes and eliminate the
soft lockup warnings.

Reproduced using a libvirt VM, created with:
        $ sudo virt-install --connect qemu:///system -v --name gpu-pt-test \
                --memory 16384 --vcpus 16 --cpu host --cdrom \
                /ubuntu-24.04.1-live-server-amd64.iso --os-variant ubuntu24.04 \
                --disk size=512 -w bridge=virbr0 --graphics none \
                --console pty,target.type=virtio \
                --hostdev pci_0000_1b_00_0 --hostdev pci_0000_43_00_0 \
                --hostdev pci_0000_52_00_0 --hostdev pci_0000_61_00_0 \
                --hostdev pci_0000_9d_00_0 --hostdev pci_0000_d1_00_0 \
                --hostdev pci_0000_df_00_0 --hostdev pci_0000_c3_00_0

[Where problems could occur]

The reverts here primarily affect the vfio_pci driver. However, in Oracular
"mm: use rwsem assertion macros for mmap_lock" is also reverted. This could
result in misbehavior of the vfio_pci driver. In Oracular, it could also result
in mmap locking bugs going undetected unless testing is done with lockdep
enabled.

[1] https://patchwork.kernel.org/project/linux-mm/list/?series=883517

** Changed in: linux-nvidia (Ubuntu Noble)
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2089306

Title:
  vfio_pci soft lockup on VM start while using PCIe passthrough

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089306/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to