Hello everyone,

I have VGA passthrough working with a single GTX TITAN X card and a Debian (sid) host. However, the card stops working after the VM shuts down; the VM only works for the first attempt after booting the host. The issue persists across reboots of the host, as well. As far as I can tell, the most reliable way to bring the card back to life is to shut down the machine and cycle the power supply.

Moreover, the card also has a tendency to switch the PCI bus it is assigned to---it changes between 03:00 and 04:00---even though the hardware is not changing. I'm not sure if this is related.

I noticed a posting from "S B" in January about a similar problem with an AMD card, but it didn't seem to have a resolution.

When the card is dead, launching the VM causes this error:
  qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
  qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
  qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
  qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
  qemu-system-x86_64: vfio-pci: Cannot read device rom at 0000:04:00.0
  Device option ROM contents are probably invalid (check dmesg).
  Skip option ROM probe with rombar=0, or load from file with romfile=

dmesg shows this (annotated):
  # After switching from pci-stub to vfio:
[ 150.737259] vgaarb: device changed decodes: PCI:0000:04:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
  # When starting the VM:
  [  198.572550] vfio-pci 0000:04:00.0: enabling device (0000 -> 0003)
  [  198.572607] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x1e@0x258
  [  198.572618] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x19@0x900
  [  200.169203] vfio-pci 0000:04:00.0: Invalid ROM contents

Providing a romfile extracted from the card using GPU-Z in bare metal Windows does nothing except remove the romfile-related advice from the error message.

Here's an lspci immediately after booting. In this case, the card was pre-dead (i.e., even the first VM launch failed):
  root@debian:~# lspci -s 04:00.0 -v
04:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: eVga.com. Corp. GM200 [GeForce GTX TITAN X]
    Flags: fast devsel, IRQ 65
    Memory at f6000000 (32-bit, non-prefetchable) [disabled] [size=16M]
    Memory at 90000000 (64-bit, prefetchable) [disabled] [size=256M]
    Memory at a0000000 (64-bit, prefetchable) [disabled] [size=32M]
    I/O ports at c000 [disabled] [size=128]
    Expansion ROM at f7000000 [disabled] [size=512K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Legacy Endpoint, MSI 00
    Capabilities: [100] Virtual Channel
    Capabilities: [250] Latency Tolerance Reporting
    Capabilities: [258] L1 PM Substates
    Capabilities: [128] Power Budgeting <?>
    Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Capabilities: [900] #19
    Kernel driver in use: vfio-pci
    Kernel modules: nouveau

(If I run lspci with -vvv, it shows the card in the D0 power state)

Here's lspci after the first VM attempt failed:
  root@debian:~# lspci -s 04:00.0 -vvv
04:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: vfio-pci
    Kernel modules: nouveau

Here's my qemu command:
sudo qemu-system-x86_64 -enable-kvm -m 4096 -cpu host,kvm=off -smp 4,sockets=1,cores=4,threads=1 -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,file=/usr/share/OVMF/OVMF_VARS.fd -drive file=/dev/sdf,format=raw -soundhw hda -usb -device usb-host,hostbus=10,hostport=1.7.3 -device usb-host,hostbus=10,hostport=1.7.4 -device vfio-pci,host=04:00.0 -device vfio-pci,host=04:00.1 -vga none

Does anybody know why this might be happening?

Thanks!

_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

Reply via email to