Just retried this morning after waking my computer up from sleep (and
having previously loaded a VM from earlier) and this time, the ROM dump
did succeed.
Diffing the contents of lspci from then and now
<https://www.diffchecker.com/xPdXUeUg>, the device does appear to have
changed power states, even though enable in sysfs still contains 0 and
attempting to write to it still gives the same "Device or resource busy"
error. The kernel logs from the wake up process also don't contain
anything related to that device. iomem hasn't changed from before. Alex
and Auger both seem to have guessed correctly, though the steps it took
to get there aren't exactly satisfying. I remember vfio-pci having a
disable D3 parameter which I'll probably be looking into later today,
seeing how power states seem to be involved in the problem.
Thanks for both of your help, I'll keep posting if I find new things.
- Nicolas
On 03/11/2019 01:46 AM, Nicolas Roy-Renaud wrote:
Hey, Alex, thanks for replying.
It seems like you're right on the Mem- part.
[user@OCCAM ~]$ lspci -s 07:00.0 -vvv
07:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce
GTX 970] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. GM204 [GeForce GTX 970]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
For comparison, here's the result from last boot after firing up and
shutting down a VM using that device:
[user@OCCAM ~]$ lspci -s 07:00.0 -vvv
07:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce
GTX 970] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. GM204 [GeForce GTX 970]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Yet when I try enabling the device, I get this :
[root@OCCAM user]# echo 1 > /sys/bus/pci/devices/0000:07:00.0/enable
bash: echo: write error: Device or resource busy
Other people have apparently had similar issues caused by other
drivers claiming the ressources, and /proc/iomem was showing efifb
bound to that device earlier, however neither disabling it in the
kernel command line or unbinding it once the system is live have
solved the issue (even though it no longer shows up in iomem). Here's
PCI section from iomem with video=efifb:off. All the commands whose
outputs you'll see here have been run under these conditions:
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:07
00000000-00000000 : 0000:07:00.0
00000000-00000000 : 0000:07:00.0
00000000-00000000 : PCI Bus 0000:01
00000000-00000000 : 0000:01:00.0
00000000-00000000 : 0000:01:00.0
00000000-00000000 : PCI Bus 0000:07
00000000-00000000 : 0000:07:00.0
00000000-00000000 : 0000:07:00.0
00000000-00000000 : 0000:07:00.1
00000000-00000000 : PCI Bus 0000:01
00000000-00000000 : 0000:01:00.0
00000000-00000000 : nvidia
00000000-00000000 : 0000:01:00.1
00000000-00000000 : ICH HD audio
00000000-00000000 : PCI Bus 0000:04
00000000-00000000 : 0000:04:00.0
00000000-00000000 : 0000:04:00.0
00000000-00000000 : ahci
00000000-00000000 : PCI Bus 0000:03
00000000-00000000 : 0000:03:00.0
00000000-00000000 : 0000:03:00.0
00000000-00000000 : r8169
00000000-00000000 : 0000:00:14.0
00000000-00000000 : xhci-hcd
00000000-00000000 : 0000:00:1b.0
00000000-00000000 : ICH HD audio
00000000-00000000 : 0000:00:1f.3
00000000-00000000 : 0000:00:1f.2
00000000-00000000 : ahci
00000000-00000000 : 0000:00:1d.0
00000000-00000000 : ehci_hcd
00000000-00000000 : 0000:00:1a.0
00000000-00000000 : ehci_hcd
00000000-00000000 : 0000:00:16.0
00000000-00000000 : mei_me
00000000-00000000 : PCI MMCONFIG 0000 [bus 00-3f]
00000000-00000000 : Reserved
00000000-00000000 : pnp 00:06
00000000-00000000 : Reserved
00000000-00000000 : IOAPIC 0
Enable reverts back to 0 after the VM using the device shuts down, and
trying to write 1 in that file still yields "Device or resource busy".
Also, unlike what I said earlier from the last time I looked into
this, it seems doing this no longer allows you to dump the ROM after
the VM has shut down, with the same error message as before.
- Nicolas
On 03/10/2019 10:09 PM, Alex Williamson wrote:
On Sun, 10 Mar 2019 18:06:37 -0400
Nicolas Roy-Renaud <nicolas.roy-renau...@ens.etsmtl.ca> wrote:
I've seen a lot of people before reccomand VFIO newcomers to flash
their
GPU if they couldn't get their passthrough working right before, and
since I know how potentially risky and avoidable this sort of procedure
is (since QEMU lets you just pass your own ROM to the VM to be used
instead), and while attempting to go through the steps myself
<http://vfio.blogspot.com/2014/08/does-my-graphics-card-rom-support-efi.html>
(even though I've had a working VFIO setup for years), got something
unexpected.
Attempting to dump the ROM from my guest card _/freshly /_/_after a
reboot_/ results in the following error message :
cat: '/sys/bus/pci/devices/0000:07:00.0/rom': Input/output error
Accompanied by the following like in dmesg :
[ 1734.316429] vfio-pci 0000:07:00.0: Invalid PCI ROM header
signature: expecting 0xaa55, got 0xffff
If lspci for the device reports:
Control: I/O- Mem- BusMaster- ...
(specifically Mem-), this could be the reason it's failing. The PCI
ROM BAR is a memory region and memory decode needs to be enabled on the
device in order to get access to it. Also if you have the device
already bound to vfio-pci, the device might be in a D3 low power state,
which could make that memory region unavailable. You can use the
'enable' file in sysfs to fix both of these, so your sequence would
look like this:
echo 1 > /sys/bus/pci/devices/0000:07:00.0/enable
echo 1 > /sys/bus/pci/devices/0000:07:00.0/rom
cat /sys/bus/pci/devices/0000:07:00.0/rom > gpu.rom
echo 0 > /sys/bus/pci/devices/0000:07:00.0/rom
echo 0 > /sys/bus/pci/devices/0000:07:00.0/enable
Thanks,
Alex
_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users
_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users