On Tue, 02 Dec 2014 08:26:20 -0700 Alex Williamson <alex.william...@redhat.com> wrote: > All of the Bonaire-based AMD GPUs seems to have issues with reset > (R7790, R7 260/X). I've tried to engage AMD on this, but haven't gotten > any response on this topic yet. For devices like this that don't > support any kind of function level reset (FLR), VFIO will try to do a > PCI bus reset on guest reboot. This is as close as we can get to how > the BIOS resets the device on a host reboot. Unfortunately on these > cards there seems to be some sort of disconnect between the PCI bus > interface reset and resetting the rest of the GPU. I believe I've even > seen cases where a PCI bus reset appears to have no affect on the GPU > when running in VGA mode. My best guess is that some firmware running > in the card isn't clearing itself on reset an attempting to reload it > causes errors. Note that a guest can be reset multiple times and the > device continues to work if the guest is restricted to standard VGA > drivers (in VGA passthrough mode of course). My experience is consistent with that description; the bus reset initiated through the hotplug reset interface appears to leave whichever part(s) of my video card in a state the AMD driver is not prepared to handle upon 2nd bootup (e.g. first VM reboot) and thereafter, it's completely gone: endless amounts of IOTLB_INV_TIMEOUT and `Completion-Wait loop timed out' kernel messages and particularly, no VGA output at all when doing primary passthrough (which I no longer require since vgacon isn't too fond of that,) and possibly even hangs upon running lspci (8) afterwards (if I remember correctly, that is.)
I had originally intended to have QEMU trace MMIO in general and PCI{,-E} bus/device traffic (as relevant) in order to establish what arcane incantations Windows could possibly be performing, but that only ended up showing me PCI configuration space read I/O and IRQ reassignments upon disabling my video card; WinDbg/Kd* is far too slow to facilitate tracing PCI{,-E} traffic through breakpoints and were I to possess the Windows Research Kernel source code, speaking completely hypothetically here, I would then unfortunately have to find out that QEMU w/ KVM plus AMD's drivers doesn't go along too well w/ Windows Server 2003. I then figured that having drivers/vfio/pci/* produce that information should ultimately lead me towards the solution but I can't quite see to that just yet; the remove/rescan dance is the only thing that, pragmatically speaking, actually works for me at present. > In your experiment with removing and rescanning the device, are you > simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'? Thanks, Yes.