I'm not sure if the problem is related to the amdgpu driver now. After
reverting my changes back to the most recent firmware I ran the apport-
collect command and it failed, hanging at the lspci command. I rebooted
and retried apport-collect, which succeeded (they're the files posted
above.) Using the current firmware amdgpu drivers wasn't the actual
problem because the lspci command worked with them and I was able to run
a VM with GPU passthrough as well (the logs posted above from apport-
collect may not be that valuable, since everything was working on that
boot.) It must be an intermittent issue that I first noticed on
2025-04-12. I've reviewed my logs for each boot and thought the issue
was related to timing, with the GPU on PCI 10:00.0 being initialized
before the driverctl command applying the vfio-pci driver, but on the
most recent reboot I saw the amdgpu driver initialize the GPU, then the
driverctl replace it but actually logged that it failed (which I've
never seen before when reviewing 30 boots) yet the lspci command
succeeds and the VM with GPU passthrough works. Here's an example of
what I thought was the issue in the logs:
Apr 26 14:49:12 dark kernel: [drm] Initialized amdgpu 3.59.0 for 0000:10:00.0
on minor 2
Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: [drm] fb1: amdgpudrmfb frame
buffer device
Apr 26 14:49:04 dark systemd[1]: Starting driverctl@pci-0000:10:00.1.service -
Load the driverctl override for pci-0000:10:00.1...
Apr 26 14:49:04 dark (udev-worker)[880]: controlC1:
/usr/lib/udev/rules.d/78-sound-card.rules:5 Failed to write
ATTR{/sys/devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.1/sound/card1/controlC1/../uevent},
ignoring: No such file or directory
Apr 26 14:49:12 dark driverctl[1940]: /usr/sbin/driverctl: line 72:
/sys//devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.0/driver/unbind:
Permission denied
Apr 26 14:49:12 dark driverctl[1940]: driverctl: unbinding 0000:10:00.0 failed
Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing
device.
Apr 26 14:49:12 dark kernel: [drm] amdgpu: ttm finalized
Apr 26 14:49:12 dark kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes
changed: olddecodes=io+mem,decodes=io+mem:owns=none
Apr 26 14:49:06 dark systemd[1]: Starting qemu-kvm.service - QEMU KVM
preparation - module, ksm, hugepages...
Apr 26 14:49:06 dark systemd[1]: Finished qemu-kvm.service - QEMU KVM
preparation - module, ksm, hugepages.
Apr 26 14:49:06 dark systemd[1]: Finished driverctl@pci-0000:10:00.1.service -
Load the driverctl override for pci-0000:10:00.1.
Apr 26 14:49:09 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service -
Load the driverctl override for pci-0000:10:00.0...
Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Main
process exited, code=exited, status=1/FAILURE
Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Failed
with result 'exit-code'.
Apr 26 14:49:09 dark systemd[1]: Failed to start
driverctl@pci-0000:10:00.0.service - Load the driverctl override for
pci-0000:10:00.0.
Apr 26 14:49:12 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service -
Load the driverctl override for pci-0000:10:00.0...
Apr 26 14:49:12 dark systemd[1]: Finished driverctl@pci-0000:10:00.0.service -
Load the driverctl override for pci-0000:10:00.0.
> lspci -nn
...
10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c3)
(prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Pulse Radeon RX 6800 [1da2:e437]
Flags: bus master, fast devsel, latency 0, IRQ 124, IOMMU group 34
Memory at 1400000000 (64-bit, prefetchable) [size=16G]
Memory at 1200000000 (64-bit, prefetchable) [size=2M]
I/O ports at e000 [size=256]
Memory at fcc00000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at fcd00000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23
HDMI/DP Audio Controller [1002:ab28]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP
Audio Controller [1002:ab28]
Flags: bus master, fast devsel, latency 0, IRQ 125, IOMMU group 35
Memory at fcd20000 (32-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
...
...
12:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB
3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
Subsystem: Gigabyte Technology Co., Ltd Matisse USB 3.0 Host Controller
[1458:5007]
Flags: bus master, fast devsel, latency 0, IRQ 122, IOMMU group 39
Memory at fc900000 (64-bit, non-prefetchable) [size=1M]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: xhci_pci
There's still an issue but I can't confirm its related to the firmware
update.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/2107285
Title:
KVM VM with GPU passthrough won't start
Status in linux-firmware package in Ubuntu:
New
Bug description:
Host OS:
Ubuntu 24.04.2 LTS
Kernel 6.11.0-21-generic
CPU: AMD Ryzen 9 5900X
Software Firmware version: F2
GPU 1: AMD Radeon RX 6400 (Used by Host OS)
GPU 2: AMD Radeon RX 6800 (Used by VMs via GPU passthrough, on PCI bus
10:00.0)
$ apt-cache policy linux-firmware
linux-firmware:
Installed: 20240318.git3b128b60-0ubuntu2.11
Candidate: 20240318.git3b128b60-0ubuntu2.11
Version table:
*** 20240318.git3b128b60-0ubuntu2.11 500
500 http://us.archive.ubuntu.com/ubuntu noble-updates/main amd64
Packages
500 http://security.ubuntu.com/ubuntu noble-security/main amd64
Packages
100 /var/lib/dpkg/status
20240318.git3b128b60-0ubuntu2 500
500 http://us.archive.ubuntu.com/ubuntu noble/main amd64 Packages
What should have happened:
VM with GPU passthrough should start
What happend instead:
VM with GPU passthrough wouldn't start. I tried running 'lspci -nns
0000:10:00.0' but this hung the terminal. Virtual Machine Manager was
now showing it couldn't connect to the KVM daemon. I rebooted the Host
OS but running 'lspci -nns 0000:10:00.0' again hung and I still
couldn't start the VM with GPU passthrough.
Extra info:
After installing updates to the Host OS on 2025-4-10, VMs without GPU
passthrough worked fine. On 2025-4-12 I tried to start a VM with GPU
passthrough but it wouldn't start.
On 2025-4-10 one of the Host OS updates was linux-firmware:amd64
(20240318.git3b128b60-0ubuntu2.10 ->
20240318.git3b128b60-0ubuntu2.11).
I wanted to test downgrading the linux-firmware back to version 2.10
but that is no longer available. I was able to find, from this
launchpad, the files that were in the 2.10 and 2.11 versions of linux-
firmware. I found the differences between the files for the amdgpu
firmware files. I overwrote the /lib/firmware/amdgpu files on my host
OS with the files from 2.10 and rebooted - the VM with GPU passthrough
was able to start (and the lspci command worked.)
The list of amdgpu firmware files I overwrote was:
gc_11_5_1_imu.bin.zst
gc_11_5_1_me.bin.zst
gc_11_5_1_mec.bin.zst
gc_11_5_1_mes1.bin.zst
gc_11_5_1_mes_2.bin.zst
gc_11_5_1_pfp.bin.zst
gc_11_5_1_rlc.bin.zst
isp_4_1_1.bin.zst
psp_14_0_1_ta.bin.zst
psp_14_0_1_toc.bin.zst
sdma_6_1_1.bin.zst
vcn_4_0_6_1.bin.zst
vcn_4_0_6.bin.zst
vpe_6_1_1.bin.zst
---
ProblemType: Bug
ApportVersion: 2.28.1-0ubuntu3.5
Architecture: amd64
CRDA: N/A
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Dependencies: firmware-sof-signed 2023.12.1-1ubuntu1.4
DistroRelease: Ubuntu 24.04
InstallationDate: Installed on 2024-06-01 (326 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
MachineType: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX
Package: linux-firmware 20240318.git3b128b60-0ubuntu2.11
PackageArchitecture: amd64
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
XDG_RUNTIME_DIR=<set>
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.11.0-21-generic
root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash amd_iommu=on iommu=pt
vt.handoff=7
ProcVersionSignature: Ubuntu 6.11.0-21.21~24.04.1-generic 6.11.11
RelatedPackageVersions:
linux-restricted-modules-6.11.0-21-generic N/A
linux-backports-modules-6.11.0-21-generic N/A
linux-firmware 20240318.git3b128b60-0ubuntu2.11
Tags: noble wayland-session
Uname: Linux 6.11.0-21-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip kvm libvirt libvirt-dnsmasq lpadmin plugdev storage
sudo users
_MarkForUpload: True
dmi.bios.date: 07/08/2021
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: F2
dmi.board.asset.tag: Default string
dmi.board.name: X570S AORUS PRO AX
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias:
dmi:bvnAmericanMegatrendsInternational,LLC.:bvrF2:bd07/08/2021:br5.17:svnGigabyteTechnologyCo.,Ltd.:pnX570SAORUSPROAX:pvr-CF:rvnGigabyteTechnologyCo.,Ltd.:rnX570SAORUSPROAX:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:skuDefaultstring:
dmi.product.family: X570 MB
dmi.product.name: X570S AORUS PRO AX
dmi.product.sku: Default string
dmi.product.version: -CF
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2107285/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp