Thank you Alex, I took a look at the link you provided, follow the guide and enable MSI, the performance of guest VM improved significantly.
Regarding to the answers & questions you mentioned in previous mail, here are my update: > You really want to avoid x-vga=on, especially with IGD host graphics. Why did you say that? And I did not see any other parameters that can be used to replace x-vga=on. > I'm also not sure why you're preventing i915 from loading if you > intend to use IGD for the host graphics. I disabled i915 driver because I don't want to apply neither i915 VGA arbiter patch nor ACS override patch. > My question would be whether the problem interrupt is the GPU or the > audio. You could remove the audio assignment and see if it still > occurs. If it is the audio device, then follow the guide above as > GeForce audio interrupts are only marginally functional anyway. The problem interrupt is the GPU, since the GPU (01:00.0) and the audio (01:00.1) are together in IOMMU group 5, I usually assign both of them at the same time to avoid "vfio: error, group 5 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver." I also tried removing the audio assignment but got the same problem > So you don't even have real guest drivers loaded... look > in /proc/interrupts with the new kernel, are there multiple devices on > the interrupt line with that kernel? Yes, you are right, there are up to 6 devices sharing interrupt 16 lspci -v ... 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 16 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001 Capabilities: [a0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [220] #19 Kernel driver in use: pcieport Kernel modules: shpchp 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 16 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 Memory behind bridge: df100000-df1fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001 Capabilities: [a0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [220] #19 Kernel driver in use: pcieport Kernel modules: shpchp 00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31) Subsystem: Gigabyte Technology Co., Ltd Device a182 Flags: fast devsel, IRQ 16 Memory at df240000 (64-bit, non-prefetchable) [size=16K] Memory at df220000 (64-bit, non-prefetchable) [size=64K] Capabilities: [50] Power Management version 3 Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+ Kernel modules: snd_hda_intel 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) Subsystem: Gigabyte Technology Co., Ltd Device 5001 Flags: medium devsel, IRQ 16 Memory at df24a000 (64-bit, non-prefetchable) [size=256] I/O ports at f040 [size=32] Kernel modules: i2c_i801 01:00.0 VGA compatible controller: NVIDIA Corporation Device 128b (rev a1) (prog-if 00 [VGA controller]) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8c93 Flags: fast devsel, IRQ 16 Memory at de000000 (32-bit, non-prefetchable) [size=16M] Memory at d0000000 (64-bit, prefetchable) [size=128M] Memory at d8000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] Expansion ROM at df000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: vfio-pci Kernel modules: nouveau 03:00.0 Non-Volatile memory controller: Intel Corporation Device f1a5 (rev 03) (prog-if 02 [NVM Express]) Subsystem: Intel Corporation Device 390a Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0 Memory at df100000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [b0] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [158] #19 Capabilities: [178] Latency Tolerance Reporting Capabilities: [180] L1 PM Substates Kernel driver in use: nvme Kernel modules: nvme > Not a known issue, root cause covered above, certainly something that > may be fixed in updated kernels, or maybe updated kernels just shutdown > or have a driver for the device sharing the interrupt That is what I want to figure out. > You could try updating one or the other. I had tried to upgrade the kernel and QEMU to the corresponding version of Fedora 24, the problem still exists. > This is a valid workaround, but it means that vfio-pci will always > require an exclusive INTx interrupt for any assigned device, which > often makes it difficult to achieve a working configuration. As above, > if the additional interrupts are not generated by the GPU/audio, then > we're potentially injecting spurious interrupts into the guest. As I tested, `nointxmask=1` may cause a new error "vfio: Error: Failed to setup INTx fd: Device or resource busy" when assign GPU and onboard audio together, This error was mentioned in https://www.redhat.com/archives/vfio-users/2016-March/msg00035.html modprobe vfio-pci ids=10de:128b,10de:0e0f,8086:a170 nointxmask=1 qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 -vga none -device vfio-pci,host=00:1f.3 qemu-system-x86_64: -device vfio-pci,host=00:1f.3: vfio: Error: Failed to setup INTx fd: Device or resource busy qemu-system-x86_64: -device vfio-pci,host=00:1f.3: Device initialization failed dmesg: [ 77.750742] VFIO - User Level meta-driver version: 0.3 [ 77.754872] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none [ 77.765719] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000 [ 77.776720] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000 [ 77.787714] vfio_pci: add [8086:a170[ffff:ffff]] class 0x000000/00000000 [ 83.681186] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003) [ 83.705664] genirq: Flags mismatch irq 16. 00000000 (vfio-intx(0000:00:1f.3)) vs. 00000000 (vfio-intx(0000:01:00.0)) [ 83.705666] CPU: 2 PID: 1953 Comm: qemu-system-x86 Not tainted 4.5.5-300.fc24.x86_64 #1 [ 83.705667] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./B150-HD3-CF, BIOS F5 03/11/2016 [ 83.705668] 0000000000000086 00000000842ff643 ffff88043eb87ca8 ffffffff813d35af [ 83.705670] ffff88045d97f000 00000000fffffff0 ffff88043eb87d00 ffffffff811011ae [ 83.705671] 0000000000000246 ffff88045d97f09c ffff88044e5db248 00000000842ff643 [ 83.705673] Call Trace: [ 83.705676] [<ffffffff813d35af>] dump_stack+0x63/0x84 [ 83.705678] [<ffffffff811011ae>] __setup_irq+0x5ee/0x640 [ 83.705682] [<ffffffffa050f2d0>] ? vfio_intx_disable+0x60/0x60 [vfio_pci] [ 83.705683] [<ffffffff81101388>] request_threaded_irq+0xf8/0x1a0 [ 83.705685] [<ffffffffa050f0c5>] vfio_intx_set_signal+0x105/0x1d0 [vfio_pci] [ 83.705686] [<ffffffffa050f437>] vfio_pci_set_intx_trigger+0xc7/0x160 [vfio_pci] [ 83.705687] [<ffffffffa050f9bf>] vfio_pci_set_irqs_ioctl+0x3f/0xa0 [vfio_pci] [ 83.705689] [<ffffffffa050dd8e>] vfio_pci_ioctl+0x2fe/0x9c0 [vfio_pci] [ 83.705690] [<ffffffff8128ed94>] ? eventfd_write+0x94/0x210 [ 83.705692] [<ffffffff810d0220>] ? wake_up_q+0x70/0x70 [ 83.705694] [<ffffffffa0461183>] vfio_device_fops_unl_ioctl+0x23/0x30 [vfio] [ 83.705696] [<ffffffff81256183>] do_vfs_ioctl+0xa3/0x5d0 [ 83.705697] [<ffffffff81256729>] SyS_ioctl+0x79/0x90 [ 83.705699] [<ffffffff817cecee>] entry_SYSCALL_64_fastpath+0x12/0x6d BRs Zhifeng ________________________________________ 发件人: Alex Williamson <alex.william...@redhat.com> 发送时间: 2017年5月25日 23:05 收件人: Hu Zhifeng 抄送: vfio-users@redhat.com 主题: Re: [vfio-users] Kernel panic at vfio_intx_handler leads to low performance in guest VM On Thu, 25 May 2017 10:53:29 +0000 Hu Zhifeng <zhifeng...@hotmail.com> wrote: > Dear all, > > I am running a fresh Fedora 23 and want to use kvm/qemu to run a windows VM > with GPU passthrough. > > My setup is as follow: > Host OS: Fedora 23 (Workstation x86_64) > Kernel: 4.2.3-300.fc23.x86_64 > QEMU version: qemu-2.4.0.1-1.fc23 > Guest VM: Windows 7 > CPU: Intel i7-6700K > Motherboard: Gigabyte B150-HD3 > IGD: Intel® HD Graphics 530 (used by the host) > Graphics Card: GT710 (used by the VM) > > First, enable IOMMU by appending the `intel_iommu=on` parameter to GRUB. > Next, prevent the kernel modules i915, nouveau and snd_hda_intel from being > loaded for both initramfs and system. > Then, load vfio-pci with ids (modprobe vfio-pci ids=10de:128b,10de:0e0f) > Last, run qemu like this: > qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp > 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 > -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device > vfio-pci,host=01:00.1 -vga none You really want to avoid x-vga=on, especially with IGD host graphics. I'm also not sure why you're preventing i915 from loading if you intend to use IGD for the host graphics. > Everything looks good and the dedicated GPU detected by the guest VM (N.B. > GPU driver `378.92-desktop-win8-win7-64bit-international-whql.exe` was ready), > But the guest VM is running very slow, and I observed kernel panic which > generated by vfio_pci. > > Here's the log from dmesg: > [ 737.317946] vgaarb: device changed decodes: > PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none > [ 737.356996] vgaarb: device changed decodes: > PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none > [ 737.367606] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000 > [ 737.378437] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000 > [ 738.233680] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003) > [ 739.755715] kvm: zapping shadow pages for mmio generation wraparound > [ 739.874265] irq 16: nobody cared (try booting with the "irqpoll" option) > [ 739.874269] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 4.2.3-300.fc23.x86_64 #1 > [ 739.874270] Hardware name: Gigabyte Technology Co., Ltd. To be filled by > O.E.M./B150-HD3-CF, BIOS F5 03/11/2016 > [ 739.874271] 0000000000000000 e5300c14e6af3df1 ffff880470c03e28 > ffffffff81771fca > [ 739.874272] 0000000000000000 ffff88045b2844a4 ffff880470c03e58 > ffffffff810f88a5 > [ 739.874273] ffff880081f42e50 ffff88045b284400 0000000000000000 > 0000000000000010 > [ 739.874275] Call Trace: > [ 739.874276] <IRQ> [<ffffffff81771fca>] dump_stack+0x45/0x57 > [ 739.874281] [<ffffffff810f88a5>] __report_bad_irq+0x35/0xd0 > [ 739.874282] [<ffffffff810f8c44>] note_interrupt+0x244/0x290 > [ 739.874284] [<ffffffff810f607c>] handle_irq_event_percpu+0x11c/0x180 > [ 739.874285] [<ffffffff810f6110>] handle_irq_event+0x30/0x60 > [ 739.874286] [<ffffffff810f91f4>] handle_fasteoi_irq+0x84/0x150 > [ 739.874287] [<ffffffff81016e42>] handle_irq+0x72/0x120 > [ 739.874289] [<ffffffff810bd66a>] ? atomic_notifier_call_chain+0x1a/0x20 > [ 739.874291] [<ffffffff8177b5df>] do_IRQ+0x4f/0xe0 > [ 739.874292] [<ffffffff817794eb>] common_interrupt+0x6b/0x6b > [ 739.874292] <EOI> [<ffffffff81108a4f>] ? > hrtimer_start_range_ns+0x1bf/0x3b0 > [ 739.874296] [<ffffffff816160c0>] ? cpuidle_enter_state+0x130/0x270 > [ 739.874297] [<ffffffff8161609b>] ? cpuidle_enter_state+0x10b/0x270 > [ 739.874298] [<ffffffff81616237>] cpuidle_enter+0x17/0x20 > [ 739.874300] [<ffffffff810dfcc2>] call_cpuidle+0x32/0x60 > [ 739.874301] [<ffffffff81616213>] ? cpuidle_select+0x13/0x20 > [ 739.874302] [<ffffffff810dff58>] cpu_startup_entry+0x268/0x320 > [ 739.874304] [<ffffffff8176870c>] rest_init+0x7c/0x80 > [ 739.874305] [<ffffffff81d5702d>] start_kernel+0x49d/0x4be > [ 739.874307] [<ffffffff81d56120>] ? early_idt_handler_array+0x120/0x120 > [ 739.874308] [<ffffffff81d56339>] x86_64_start_reservations+0x2a/0x2c > [ 739.874309] [<ffffffff81d56485>] x86_64_start_kernel+0x14a/0x16d > [ 739.874309] handlers: > [ 739.874313] [<ffffffffa05172d0>] vfio_intx_handler [vfio_pci] > [ 739.874313] Disabling IRQ #16 What's happening here is that the spurious interrupt handling code is noting that there are too many unhandled interrupts on this IRQ and disabling it, which switches to a polling mode behavior and yes, performance will be terrible. My write-up on making Windows use MSI covers some of the background for this: http://vfio.blogspot.com/2014/09/vfio-interrupts-and-how-to-coax-windows.html In summary we rely on the device to tell us when an interrupt is pending to claim the interrupt, if it doesn't then we assume it's another device sharing the interrupt and let it go. If it's actually our device interrupting without indicating so or there's another device shouting on the same interrupt line, you can hit this problem. > What I've tried so far: > 1. Different graphics card (GTX750Ti), with same results My question would be whether the problem interrupt is the GPU or the audio. You could remove the audio assignment and see if it still occurs. If it is the audio device, then follow the guide above as GeForce audio interrupts are only marginally functional anyway. > 2. Different host OS (Fedora 24: Kernel 4.5.5-300.fc24.x86_64 + > qemu-2.6.2-8.fc24), without any issues That's interesting, I don't know what would be different, but also why are you running the original FC23 kernel when I know there are FC23 updates that bring it up to a 4.8 kernel? If you don't keep your software up to date, bugs are to be expected. > 3. Load vfio-pci with `nointxmask=1`, without any issues With this option we get an exclusive interrupt for the device and then we handle each interrupt under the assumption that it's for our device. If there's really something else pulling this interrupt, that might me we're injecting additional (spurious) interrupts into the guest. Generally this is ok so long as we don't hit a rate sufficient to trigger similar spurious interrupt shutdown in the guest. > 4. Remove `-hda ~/win7.img` from QEMU command (seabios only), still get the > same crash So you don't even have real guest drivers loaded... look in /proc/interrupts with the new kernel, are there multiple devices on the interrupt line with that kernel? > So I have some questions now: > 1. Is this a known issue? what is the root cause? Not a known issue, root cause covered above, certainly something that may be fixed in updated kernels, or maybe updated kernels just shutdown or have a driver for the device sharing the interrupt. > 2. Why Fedora 24 does not have this issue? related to kernel, qemu or other > components? You could try updating one or the other. > 3. Is `nointxmask=1` the right way to avoid crash? This is a valid workaround, but it means that vfio-pci will always require an exclusive INTx interrupt for any assigned device, which often makes it difficult to achieve a working configuration. As above, if the additional interrupts are not generated by the GPU/audio, then we're potentially injecting spurious interrupts into the guest. Thanks, Alex _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users