Hello everyone,
I have VGA passthrough working with a single GTX TITAN X card and a
Debian (sid) host. However, the card stops working after the VM shuts
down; the VM only works for the first attempt after booting the host.
The issue persists across reboots of the host, as well. As far as I can
tell, the most reliable way to bring the card back to life is to shut
down the machine and cycle the power supply.
Moreover, the card also has a tendency to switch the PCI bus it is
assigned to---it changes between 03:00 and 04:00---even though the
hardware is not changing. I'm not sure if this is related.
I noticed a posting from "S B" in January about a similar problem with
an AMD card, but it didn't seem to have a resolution.
When the card is dead, launching the VM causes this error:
qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
qemu-system-x86_64: vfio-pci: Cannot read device rom at 0000:04:00.0
Device option ROM contents are probably invalid (check dmesg).
Skip option ROM probe with rombar=0, or load from file with romfile=
dmesg shows this (annotated):
# After switching from pci-stub to vfio:
[ 150.737259] vgaarb: device changed decodes:
PCI:0000:04:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
# When starting the VM:
[ 198.572550] vfio-pci 0000:04:00.0: enabling device (0000 -> 0003)
[ 198.572607] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x1e@0x258
[ 198.572618] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x19@0x900
[ 200.169203] vfio-pci 0000:04:00.0: Invalid ROM contents
Providing a romfile extracted from the card using GPU-Z in bare metal
Windows does nothing except remove the romfile-related advice from the
error message.
Here's an lspci immediately after booting. In this case, the card was
pre-dead (i.e., even the first VM launch failed):
root@debian:~# lspci -s 04:00.0 -v
04:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce
GTX TITAN X] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. GM200 [GeForce GTX TITAN X]
Flags: fast devsel, IRQ 65
Memory at f6000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at 90000000 (64-bit, prefetchable) [disabled] [size=256M]
Memory at a0000000 (64-bit, prefetchable) [disabled] [size=32M]
I/O ports at c000 [disabled] [size=128]
Expansion ROM at f7000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1
Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: vfio-pci
Kernel modules: nouveau
(If I run lspci with -vvv, it shows the card in the D0 power state)
Here's lspci after the first VM attempt failed:
root@debian:~# lspci -s 04:00.0 -vvv
04:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce
GTX TITAN X] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: vfio-pci
Kernel modules: nouveau
Here's my qemu command:
sudo qemu-system-x86_64 -enable-kvm -m 4096 -cpu host,kvm=off -smp
4,sockets=1,cores=4,threads=1 -drive
if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive
if=pflash,format=raw,file=/usr/share/OVMF/OVMF_VARS.fd -drive
file=/dev/sdf,format=raw -soundhw hda -usb -device
usb-host,hostbus=10,hostport=1.7.3 -device
usb-host,hostbus=10,hostport=1.7.4 -device vfio-pci,host=04:00.0 -device
vfio-pci,host=04:00.1 -vga none
Does anybody know why this might be happening?
Thanks!
_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users