[Kernel-packages] [Bug 2089306] Re: vfio_pci soft lockup on VM start while using PCIe passthrough

Jacob Martin Thu, 21 Nov 2024 13:51:17 -0800

Soft lockup warnings in guest VM during boot if only "vfio/pci: Insert full vma 
on mmap'd MMIO fault" is reverted on the host kernel:
[  157.283187] watchdog: BUG: soft lockup - CPU#10 stuck for 135s! [swapper/0:1]
[  157.283187] Modules linked in:
[  157.283187] CPU: 10 PID: 1 Comm: swapper/0 Tainted: G             L     
6.8.0-49-generic #49-Ubuntu
[  157.283187] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-debian-1.16.3-2 04/01/2014
[  157.283187] RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x60
[  157.283187] Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 49 89 f0 48 
89 e5 e8 9f 09 00 00 90 41 f7 c0 00 02 00 00 74 06 fb 0f 1f 44 00 00 <65> ff 0d 
50 dd 7e 59 74 13 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31
[  157.283187] RSP: 0000:ff47fff940013740 EFLAGS: 00000206
[  157.283187] RAX: 0000000000000001 RBX: 0000000000000143 RCX: 0000000000000000
[  157.283187] RDX: 0000000000000cfc RSI: 0000000000000297 RDI: ffffffffa858a138
[  157.283187] RBP: ff47fff940013740 R08: 0000000000000297 R09: 0000000000000000
[  157.283187] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[  157.283187] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000007
[  157.283187] FS:  0000000000000000(0000) GS:ff455320f7700000(0000) 
knlGS:0000000000000000
[  157.283187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  157.283187] CR2: 0000000000000000 CR3: 000000017a63c001 CR4: 0000000000771ef0
[  157.283187] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  157.283187] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  157.283187] PKRU: 55555554
[  157.283187] Call Trace:
[  157.283187]  <IRQ>
[  157.283187]  ? show_regs+0x6d/0x80
[  157.283187]  ? watchdog_timer_fn+0x206/0x290
[  157.283187]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  157.283187]  ? __hrtimer_run_queues+0x10f/0x2a0
[  157.283187]  ? hrtimer_run_queues+0xf1/0x170
[  157.283187]  ? update_process_times+0x36/0xb0
[  157.283187]  ? clockevents_program_event+0xbe/0x150
[  157.283187]  ? tick_periodic+0x2b/0x90
[  157.283187]  ? tick_handle_periodic+0x25/0x80
[  157.283187]  ? __sysvec_apic_timer_interrupt+0x4e/0x150
[  157.283187]  ? sysvec_apic_timer_interrupt+0x8d/0xd0
[  157.283187]  </IRQ>
[  157.283187]  <TASK>
[  157.283187]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  157.283187]  ? _raw_spin_unlock_irqrestore+0x21/0x60
[  157.283187]  pci_conf1_write+0xad/0xf0
[  157.283187]  pci_write+0x53/0x90
[  157.284176]  pci_bus_write_config_word+0x24/0x50
[  157.284176]  pci_write_config_word+0x27/0x50
[  157.284176]  __pci_read_base+0x409/0x420
[  157.284176]  pci_read_bases+0x58/0xf0
[  157.284176]  pci_setup_device+0x3de/0x780
[  157.284176]  pci_scan_single_device+0xc2/0x110
[  157.284176]  pci_scan_slot+0x7a/0x230
[  157.284176]  pci_scan_child_bus_extend+0x3b/0x2c0
[  157.284176]  pci_scan_bridge_extend+0x675/0x790
[  157.284176]  ? pci_scan_single_device+0x9b/0x110
[  157.284176]  pci_scan_child_bus_extend+0xf4/0x2c0
[  157.284176]  pci_scan_child_bus+0x10/0x20
[  157.284176]  acpi_pci_root_create+0x289/0x310
[  157.284176]  pci_acpi_scan_root+0x216/0x260
[  157.284176]  acpi_pci_root_add+0x231/0x3e0
[  157.284176]  ? acpi_pnp_match+0x31/0x50
[  157.284176]  acpi_bus_attach+0x148/0x260
[  157.284176]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
[  157.284176]  acpi_dev_for_one_check+0x33/0x50
[  157.284176]  device_for_each_child+0x6b/0xb0
[  157.284176]  acpi_dev_for_each_child+0x3b/0x60
[  157.284176]  ? __pfx_acpi_bus_attach+0x10/0x10
[  157.284176]  acpi_bus_attach+0x227/0x260
[  157.284176]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
[  157.284176]  acpi_dev_for_one_check+0x33/0x50
[  157.284176]  device_for_each_child+0x6b/0xb0
[  157.284176]  acpi_dev_for_each_child+0x3b/0x60
[  157.284176]  ? __pfx_acpi_bus_attach+0x10/0x10
[  157.284176]  acpi_bus_attach+0x227/0x260
[  157.284176]  acpi_bus_scan+0x7d/0x200
[  157.284176]  acpi_scan_init+0xe5/0x1c0
[  157.284176]  acpi_init+0x85/0x170
[  157.284176]  ? __pfx_acpi_init+0x10/0x10
[  157.284176]  do_one_initcall+0x5b/0x340
[  157.284176]  do_initcalls+0x107/0x230
[  157.284176]  ? __pfx_kernel_init+0x10/0x10
[  157.284176]  kernel_init_freeable+0x134/0x210
[  157.284176]  kernel_init+0x1b/0x200
[  157.284176]  ret_from_fork+0x44/0x70
[  157.284176]  ? __pfx_kernel_init+0x10/0x10
[  157.284176]  ret_from_fork_asm+0x1b/0x30
[  157.284176]  </TASK>
[  157.285445] pci 0000:07:00.0: BAR 4 [mem 0x382000000000-0x382001ffffff 64bit 
pref]


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2089306

Title:
  vfio_pci soft lockup on VM start while using PCIe passthrough

Status in linux package in Ubuntu:
  Invalid
Status in linux-nvidia package in Ubuntu:
  Invalid
Status in linux source package in Noble:
  In Progress
Status in linux-nvidia source package in Noble:
  In Progress

Bug description:
  When starting a VM with a passthrough PCIe device, the vfio_pci driver
  will block while its fault handler pre-faults the entire mapped area.
  For PCIe devices with large BAR regions this takes a very long time to
  complete, and thus causes soft lockup warnings on the host. This
  process can take hours with multiple passthrough large BAR region PCIe
  devices.

  This issue was introduced in kernel version 6.8.0-48-generic, with the
  addition of patches "vfio/pci: Use unmap_mapping_range()" and
  "vfio/pci: Insert full vma on mmap'd MMIO fault".

  The patch "vfio/pci: Use unmap_mapping_range()" rewrote the way VFIO
  tracks mapped regions to use the "vmf_insert_pfn" function instead of
  tracking them itself and using "io_remap_pfn_range". The
  implementation using "vmf_insert_pfn" is significantly slower.

  The patch "vfio/pci: Insert full vma on mmap'd MMIO fault" introduced
  this pre-faulting behavior, causing soft lockup warnings on the host
  while the VM launches.

  Without "vfio/pci: Insert full vma on mmap'd MMIO fault", a guest OS
  experiences significantly longer boot times as faults are generated
  while configuring the passthrough PCIe devices, but the host does not
  see soft lockup warnings.

  Both of these performance issues are resolved upstream by patchset
  [1], but this would be a complex backport to 6.8, with significant
  changes to core parts of the kernel.

  The "vfio/pci: Use unmap_mapping_range()" patch was introduced as part
  of patchset [2], and is intended to resolve a WARN_ON splat introduced
  by the upstream patch ba168b52bf8e ("mm: use rwsem assertion macros
  for mmap_lock"). However, this mmap_lock patch is not present in
  noble:linux, and hence noble:linux was never impacted by the WARN_ON
  issue.

  Thus, we can safely revert the following patches to resolve this VFIO 
slowdown:
  - "vfio/pci: Insert full vma on mmap'd MMIO fault"
  - "vfio/pci: Use unmap_mapping_range()"

  [1] https://patchwork.kernel.org/project/linux-mm/list/?series=883517
  [2] 
https://lore.kernel.org/all/20240530045236.1005864-3-alex.william...@redhat.com/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089306/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2089306] Re: vfio_pci soft lockup on VM start while using PCIe passthrough

Reply via email to