On Tue, 2014-12-02 at 13:53 +0100, Lucio Andrés Illanes Albornoz wrote: > Hello, > > I'm doing secondary VGA passthrough with an AMD Radeon R7 260X using > QEMU v2.1.2 w/ KVM and VFIO on Debian v7.7 (wheezy) (qemu v2.1 > +dfsg-5~bpo70+1 from wheezy-backports) and kernel version 3.16.5 (from > wheezy-backports as well) and Windows 8.1 Update 1 (x64) as the guest > OS. > > At present, rebooting the VM reproducibly has Windows fail to > enable/start said video card upon bootup w/ an error code of 43, as > seems to be the case w/ mostly everyone else running a comparable > configuration; disabling/ejecting it before rebooting/powering down > the VM from within the guest, as with everyone else, has proven to be > a reliable mitigation. However, being that there are scenarios where > this is either not feasible or impossible altogether, short of if done > through a service or kernel-mode driver (and even then,) I had > intended to investigate the causes behind this issue. > > Unfortunately, the flu got to me first (so to speak.) I did notice > that simply removing the PCI device in question and then causing a PCI > bus (re)scan (both) through sysfs on the host in between VM > reboots/power cycles is effectively equivalent to disabling it within > the guest. Thus, I find myself wondering precisely what it is that > does take place when doing so vs. when QEMU performs a `hot reset' > through the corresponding interface in drivers/vfio/pci/; evidently, > the difference must be of sufficient importance since the latter > mechanism ends up leaving my video card unavailable for subsequent VM > operation until the next host reboot. > > I should very much appreciate any hints concerning whether it would be > possible to have QEMU/VFIO perform whatever need be done itself or if > it should be possible to have this be done by either itself.
All of the Bonaire-based AMD GPUs seems to have issues with reset (R7790, R7 260/X). I've tried to engage AMD on this, but haven't gotten any response on this topic yet. For devices like this that don't support any kind of function level reset (FLR), VFIO will try to do a PCI bus reset on guest reboot. This is as close as we can get to how the BIOS resets the device on a host reboot. Unfortunately on these cards there seems to be some sort of disconnect between the PCI bus interface reset and resetting the rest of the GPU. I believe I've even seen cases where a PCI bus reset appears to have no affect on the GPU when running in VGA mode. My best guess is that some firmware running in the card isn't clearing itself on reset an attempting to reload it causes errors. Note that a guest can be reset multiple times and the device continues to work if the guest is restricted to standard VGA drivers (in VGA passthrough mode of course). In your experiment with removing and rescanning the device, are you simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'? Thanks, Alex