Zir, Thanks for the response. I really thought that would work, as the problem does seem to follow the chipset but alas no change. I added intremap=off per your suggestion and found the nox2apic, which I presume will enable xAPIC instead.
I searched for information about setting a bit to do the same in the the DMAR table, but I wasn't able to find anything about that. I'm not sure if nox2apic achieves that. root@ads-120elmst-proxmox-1:~# dmesg |grep -i apic [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.10.17-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet vfio_iommu_type1.allow_unsafe_interrupts=1 intel_iommu=on intremap=off nox2apic [ 0.000000] ACPI: APIC 0x000000009F780390 0000D8 (v01 051111 APIC2126 20110511 MSFT 00000097) [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] IOAPIC[0]: apic_id 6, version 32, address 0xfec00000, GSI 0-23 [ 0.000000] IOAPIC[1]: apic_id 7, version 32, address 0xfec8a000, GSI 24-47 [ 0.000000] Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.10.17-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet vfio_iommu_type1.allow_unsafe_interrupts=1 intel_iommu=on intremap=off nox2apic [ 0.090605] Switched APIC routing to physical flat. [ 0.091091] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.415386] ACPI: Using IOAPIC for interrupt routing [ 0.946833] intel_idle: lapic_timer_reliable_states 0xffffffff Here's what the system log looks like when running both VMs (it seems to be a bit more resilient with the 2017 OS updates, but still crashes sooner than later): Nov 19 22:12:47 ads-120elmst-proxmox-1 kernel: [ 2322.345972] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.803 msecs Nov 19 22:31:32 ads-120elmst-proxmox-1 kernel: [ 3447.384827] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.812 msecs Nov 19 22:34:25 ads-120elmst-proxmox-1 kernel: [ 3620.283114] kvm_get_msr_common: 3 callbacks suppressed Nov 19 22:34:57 ads-120elmst-proxmox-1 kernel: [ 3652.379106] kvm_get_msr_common: 134 callbacks suppressed Nov 19 22:39:23 ads-120elmst-proxmox-1 kernel: [ 3918.236976] kvm_get_msr_common: 134 callbacks suppressed Nov 19 22:52:32 ads-120elmst-proxmox-1 kernel: [ 4707.406853] usb 9-1.2: reset full-speed USB device number 5 using xhci_hcd Nov 19 22:52:33 ads-120elmst-proxmox-1 kernel: [ 4708.055537] usb 9-1.4: reset full-speed USB device number 7 using xhci_hcd Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.445246] vmbr0: port 3(tap110i0) entered disabled state Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.554581] input: Logitech G510s Gaming Keyboard as /devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.4/9-1.4:1.0/0003:046D:C22D.000F/input/input20 Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.612961] hid-generic 0003:046D:C22D.000F: input,hidraw0: USB HID v1.11 Keyboard [Logitech G510s Gaming Keyboard] on usb-0000:02:00.0-1.4/input0 Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.648397] input: Logitech G510s Gaming Keyboard as /devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.4/9-1.4:1.1/0003:046D:C22D.0010/input/input21 Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.705008] hid-generic 0003:046D:C22D.0010: input,hiddev0,hidraw1: USB HID v1.11 Device [Logitech G510s Gaming Keyboard] on usb-0000:02:00.0-1.4/input1 Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.708743] input: Logitech USB Receiver as /devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.2/9-1.2:1.0/0003:046D:C531.0011/input/input22 Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.709067] hid-generic 0003:046D:C531.0011: input,hidraw2: USB HID v1.11 Mouse [Logitech USB Receiver] on usb-0000:02:00.0-1.2/input0 Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.722617] input: Logitech USB Receiver as /devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.2/9-1.2:1.1/0003:046D:C531.0012/input/input23 Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.781100] hid-generic 0003:046D:C531.0012: input,hiddev0,hidraw3: USB HID v1.11 Keyboard [Logitech USB Receiver] on usb-0000:02:00.0-1.2/input1 Nov 19 22:53:05 ads-120elmst-proxmox-1 kernel: [ 4739.816936] hid-generic 0003:2101:8501.0013: hiddev0,hidraw4: USB HID v1.11 Device [Action Star USB HID] on usb-0000:02:00.0-1.1/input0 Nov 19 22:53:06 ads-120elmst-proxmox-1 kernel: [ 4741.055914] usb 9-2.2: reset low-speed USB device number 8 using xhci_hcd noNov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.399598] vmbr0: port 2(tap111i0) entered disabled state Nov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.464740] input: Logitech USB Optical Mouse as /devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-2/9-2.2/9-2.2:1.0/0003:046D:C077.0014/input/input24 Nov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.465021] hid-generic 0003:046D:C077.0014: input,hidraw5: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-0000:02:00.0-2.2/input0 Nov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.482748] hid-generic 0003:2101:8501.0015: hiddev0,hidraw6: USB HID v1.11 Device [Action Star USB HID] on usb-0000:02:00.0-2.1/input0 Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.661898] device tap110i0 entered promiscuous mode Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668725] vmbr0: port 2(tap110i0) entered blocking state Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668726] vmbr0: port 2(tap110i0) entered disabled state Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668800] vmbr0: port 2(tap110i0) entered blocking state Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668801] vmbr0: port 2(tap110i0) entered forwarding state Nov 19 22:54:33 ads-120elmst-proxmox-1 kernel: [ 4827.964488] vmbr0: port 2(tap110i0) entered disabled state Nov 19 22:54:38 ads-120elmst-proxmox-1 kernel: [ 4833.565853] usb 9-2.2: USB disconnect, device number 8 Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.038318] usb 9-2.2: new low-speed USB device number 9 using xhci_hcd Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148988] usb 9-2.2: New USB device found, idVendor=046d, idProduct=c077 Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148991] usb 9-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148993] usb 9-2.2: Product: USB Optical Mouse Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148994] usb 9-2.2: Manufacturer: Logitech Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.153141] input: Logitech USB Optical Mouse as /devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-2/9-2.2/9-2.2:1.0/0003:046D:C077.0016/input/input25 Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.153521] hid-generic 0003:046D:C077.0016: input,hidraw5: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-0000:02:00.0-2.2/input0 Nov 19 22:55:42 ads-120elmst-proxmox-1 kernel: [ 4897.081151] usb 9-2.2: USB disconnect, device number 9 Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.582198] usb 9-2.2: new low-speed USB device number 10 using xhci_hcd Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694029] usb 9-2.2: New USB device found, idVendor=046d, idProduct=c077 Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694031] usb 9-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694033] usb 9-2.2: Product: USB Optical Mouse Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694035] usb 9-2.2: Manufacturer: Logitech Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.701760] input: Logitech USB Optical Mouse as /devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-2/9-2.2/9-2.2:1.0/0003:046D:C077.0017/input/input26 Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.702086] hid-generic 0003:046D:C077.0017: input,hidraw5: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-0000:02:00.0-2.2/input0 Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.677507] device tap110i0 entered promiscuous mode Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688302] vmbr0: port 2(tap110i0) entered blocking state Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688304] vmbr0: port 2(tap110i0) entered disabled state Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688387] vmbr0: port 2(tap110i0) entered blocking state Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688388] vmbr0: port 2(tap110i0) entered forwarding state Nov 19 22:55:50 ads-120elmst-proxmox-1 kernel: [ 4904.869968] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x19@0x900 Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.874808] device tap111i0 entered promiscuous mode Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880827] vmbr0: port 3(tap111i0) entered blocking state Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880829] vmbr0: port 3(tap111i0) entered disabled state Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880895] vmbr0: port 3(tap111i0) entered blocking state Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880896] vmbr0: port 3(tap111i0) entered forwarding state Nov 19 22:55:53 ads-120elmst-proxmox-1 kernel: [ 4908.513876] usb 9-1.1: reset high-speed USB device number 4 using xhci_hcd Nov 19 22:55:54 ads-120elmst-proxmox-1 kernel: [ 4908.938243] usb 9-1.2: reset full-speed USB device number 5 using xhci_hcd Nov 19 22:55:54 ads-120elmst-proxmox-1 kernel: [ 4909.297968] usb 9-1.4: reset full-speed USB device number 7 using xhci_hcd Nov 19 22:55:55 ads-120elmst-proxmox-1 kernel: [ 4909.781433] usb 9-1.2: reset full-speed USB device number 5 using xhci_hcd Nov 19 22:55:55 ads-120elmst-proxmox-1 kernel: [ 4910.176352] usb 9-1.4: reset full-speed USB device number 7 using xhci_hcd Nov 19 22:56:06 ads-120elmst-proxmox-1 kernel: [ 4921.624815] vfio_ecap_init: 0000:05:00.0 hiding ecap 0x1e@0x258 Nov 19 22:56:06 ads-120elmst-proxmox-1 kernel: [ 4921.624824] vfio_ecap_init: 0000:05:00.0 hiding ecap 0x19@0x900 Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.086319] usb 9-1.4: reset full-speed USB device number 7 using xhci_hcd Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.282305] usb 9-1.2: reset full-speed USB device number 5 using xhci_hcd Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.480909] usb 9-1.1: reset high-speed USB device number 4 using xhci_hcd Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.666275] usb 9-1.2: reset full-speed USB device number 5 using xhci_hcd Nov 19 22:56:10 ads-120elmst-proxmox-1 kernel: [ 4924.855264] usb 9-1.4: reset full-speed USB device number 7 using xhci_hcd Nov 19 22:56:10 ads-120elmst-proxmox-1 kernel: [ 4925.076862] usb 9-1.1: reset high-speed USB device number 4 using xhci_hcd Nov 19 22:56:10 ads-120elmst-proxmox-1 kernel: [ 4925.262218] usb 9-1.2: reset full-speed USB device number 5 using xhci_hcd There are more USB resets than I remember last time I tried this. Thanks again for the response, and if you have any other thoughts on the matter I'd love to hear them. -Brian ----- Original Message ----- From: "vfio-users-request" <vfio-users-requ...@redhat.com> To: "vfio-users" <vfio-users@redhat.com> Sent: Sunday, November 19, 2017 12:00:11 PM Subject: vfio-users Digest, Vol 28, Issue 16 Message: 4 Date: Sun, 19 Nov 2017 08:53:32 +0000 From: Zir Blazer <zir_bla...@hotmail.com> To: "vfio-users@redhat.com" <vfio-users@redhat.com> Subject: Re: [vfio-users] GPU driver crashes when running a second VM if either VM has a virtual disk stored on physical media other than the root disk. Tested on three X58 chipset MBs Message-ID: <cy4pr15mb146404edebb7caa99b74972ef3...@cy4pr15mb1464.namprd15.prod.outlook.com> Content-Type: text/plain; charset="iso-8859-1" The Nehalem era X58, 5500 and 5520 Chipsets had a notoriously broken Interrupt Remapping implementation: https://support.citrix.com/article/CTX136517 (Love that symptoms list) https://www.netiq.com/support/kb/doc.php?id=7014344 https://serverfault.com/questions/745593/does-disabling-vt-d-and-interrupt-remapping-break-msi-x Interrupt Remapping was directly related to some x2APIC, MSI-X (Not sure if MSI) and IOMMU features which obviously on X58 platforms don't work as intended. You can try to force disabling them with Kernel Parameters (intremap=off and something else to force old xAPIC) and see if it improves. Google around also the X2APIC Disable Bit in the ACPI DMAR Table (I recall that I wrote something related to it). VFIO had also an allow_unsafe_interrupts=1 option that was also related to Nehalem broken Chipsets. Basically, you have trying to use early era Hardware that was quite buggy, so is expected that Passthrough would be problematic. Have fun! -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://www.redhat.com/archives/vfio-users/attachments/20171119/c5f4ae66/attachment.html> _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users