On Mon, 24 Feb 2020, at 5:23 PM, Alex Williamson wrote: > On Mon, 24 Feb 2020 10:40:39 +0000 > "Bronek Kozicki" <b...@spamcop.net> wrote: > > > Heads up to anyone running the latest vanilla kernels - after upgrade > > from 5.4.21 to 5.4.22 one of my VMs lost access to a vfio1 > > passed-through GPU. This was restored when I downgraded to 5.4.21 so > > the problem seems related to some patch in version 5.4.22 > > > > Also, when starting the VM, I noticed the hypervisor log flooded with > > messages "BAR 3: can't reserve" like: > > > > Feb 24 09:49:38 gdansk.lan.incorrekt.net kernel: vfio-pci > > 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258 Feb 24 09:49:38 > > gdansk.lan.incorrekt.net kernel: vfio-pci 0000:03:00.0: > > vfio_ecap_init: hiding ecap 0x19@0x900 Feb 24 09:49:38 > > gdansk.lan.incorrekt.net kernel: vfio-pci 0000:03:00.0: BAR 3: can't > > reserve [mem 0xc0000000-0xc1ffffff 64bit pref] Feb 24 09:49:38 > > gdansk.lan.incorrekt.net kernel: vfio-pci 0000:03:00.0: No more image > > in the PCI ROM Feb 24 09:51:43 gdansk.lan.incorrekt.net kernel: > > vfio-pci 0000:03:00.0: BAR 3: can't reserve [mem > > 0xc0000000-0xc1ffffff 64bit pref] Feb 24 09:51:43 > > gdansk.lan.incorrekt.net kernel: vfio-pci 0000:03:00.0: BAR 3: can't > > reserve [mem 0xc0000000-0xc1ffffff 64bit pref] Feb 24 09:51:43 > > gdansk.lan.incorrekt.net kernel: vfio-pci 0000:03:00.0: BAR 3: can't > > reserve [mem 0xc0000000-0xc1ffffff 64bit pref] Feb 24 09:51:43 > > gdansk.lan.incorrekt.net kernel: vfio-pci 0000:03:00.0: BAR 3: can't > > reserve [mem 0xc0000000-0xc1ffffff 64bit pref] Feb 24 09:51:43 > > gdansk.lan.incorrekt.net kernel: vfio-pci 0000:03:00.0: BAR 3: can't > > reserve [mem 0xc0000000-0xc1ffffff 64bit pref] > > > > journalctl -b-2 | grep "vfio-pci 0000:03:00.0: BAR 3: can't reserve" > > | wc -l 2609 > > > > Finally, when shutting down the VM I observed kernel panic on the > > hypervisor: > > > > [ 873.831301] Kernel panic - not syncing: Timeout: Not all CPUs > > entered broadcast exception handler [ 874.874008] Shutting down cpus > > with NMI [ 874.888189] Kernel Offset: 0x0 from 0xffffffff81000000 > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ > > 875.074319] Rebooting in 30 seconds.. > > Tried v5.4.22, not getting anything similar. Potentially there's a > driver activated in this kernel that wasn't previously on your system > and it's attached itself to part of your device. Look in /proc/iomem > to see what it might be and disable it. Thanks, > > Alex
Thank you Alex. One more thing which might be relevant: my system has two identical GPUs (Quadro M5000), each in its own IOMMU group, and two VMs each using one of these GPUs. One of the VMs is Windows 10 and I think it is configured for MSI-X, the other is Ubuntu Biopic with stable nvidia drivers. I will try to find more debugging information when I get home, but perhaps above will allow you to reproduce. B. -- Bronek Kozicki b...@spamcop.net _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users