Zir,

Thanks for the response.  I really thought that would work, as the problem does 
seem to follow the chipset but alas no change.  I added intremap=off per your 
suggestion and found the nox2apic, which I presume will enable xAPIC instead.

I searched for information about setting a bit to do the same in the the DMAR 
table, but I wasn't able to find anything about that.  I'm not sure if nox2apic 
achieves that.




root@ads-120elmst-proxmox-1:~# dmesg |grep -i apic
[    0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.10.17-3-pve 
root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet 
vfio_iommu_type1.allow_unsafe_interrupts=1 intel_iommu=on intremap=off nox2apic
[    0.000000] ACPI: APIC 0x000000009F780390 0000D8 (v01 051111 APIC2126 
20110511 MSFT 00000097)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] IOAPIC[0]: apic_id 6, version 32, address 0xfec00000, GSI 0-23
[    0.000000] IOAPIC[1]: apic_id 7, version 32, address 0xfec8a000, GSI 24-47
[    0.000000] Kernel command line: 
BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.10.17-3-pve root=ZFS=rpool/ROOT/pve-1 ro 
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet 
vfio_iommu_type1.allow_unsafe_interrupts=1 intel_iommu=on intremap=off nox2apic
[    0.090605] Switched APIC routing to physical flat.
[    0.091091] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.415386] ACPI: Using IOAPIC for interrupt routing
[    0.946833] intel_idle: lapic_timer_reliable_states 0xffffffff



Here's what the system log looks like when running both VMs (it seems to be a 
bit more resilient with the 2017 OS updates, but still crashes sooner than 
later):

Nov 19 22:12:47 ads-120elmst-proxmox-1 kernel: [ 2322.345972] INFO: NMI handler 
(perf_event_nmi_handler) took too long to run: 3.803 msecs
Nov 19 22:31:32 ads-120elmst-proxmox-1 kernel: [ 3447.384827] INFO: NMI handler 
(perf_event_nmi_handler) took too long to run: 3.812 msecs
Nov 19 22:34:25 ads-120elmst-proxmox-1 kernel: [ 3620.283114] 
kvm_get_msr_common: 3 callbacks suppressed
Nov 19 22:34:57 ads-120elmst-proxmox-1 kernel: [ 3652.379106] 
kvm_get_msr_common: 134 callbacks suppressed
Nov 19 22:39:23 ads-120elmst-proxmox-1 kernel: [ 3918.236976] 
kvm_get_msr_common: 134 callbacks suppressed
Nov 19 22:52:32 ads-120elmst-proxmox-1 kernel: [ 4707.406853] usb 9-1.2: reset 
full-speed USB device number 5 using xhci_hcd
Nov 19 22:52:33 ads-120elmst-proxmox-1 kernel: [ 4708.055537] usb 9-1.4: reset 
full-speed USB device number 7 using xhci_hcd





Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.445246] vmbr0: port 
3(tap110i0) entered disabled state
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.554581] input: Logitech 
G510s Gaming Keyboard as 
/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.4/9-1.4:1.0/0003:046D:C22D.000F/input/input20
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.612961] hid-generic 
0003:046D:C22D.000F: input,hidraw0: USB HID v1.11 Keyboard [Logitech G510s 
Gaming Keyboard] on usb-0000:02:00.0-1.4/input0
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.648397] input: Logitech 
G510s Gaming Keyboard as 
/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.4/9-1.4:1.1/0003:046D:C22D.0010/input/input21
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.705008] hid-generic 
0003:046D:C22D.0010: input,hiddev0,hidraw1: USB HID v1.11 Device [Logitech 
G510s Gaming Keyboard] on usb-0000:02:00.0-1.4/input1
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.708743] input: Logitech 
USB Receiver as 
/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.2/9-1.2:1.0/0003:046D:C531.0011/input/input22
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.709067] hid-generic 
0003:046D:C531.0011: input,hidraw2: USB HID v1.11 Mouse [Logitech USB Receiver] 
on usb-0000:02:00.0-1.2/input0
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.722617] input: Logitech 
USB Receiver as 
/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-1/9-1.2/9-1.2:1.1/0003:046D:C531.0012/input/input23
Nov 19 22:53:04 ads-120elmst-proxmox-1 kernel: [ 4739.781100] hid-generic 
0003:046D:C531.0012: input,hiddev0,hidraw3: USB HID v1.11 Keyboard [Logitech 
USB Receiver] on usb-0000:02:00.0-1.2/input1
Nov 19 22:53:05 ads-120elmst-proxmox-1 kernel: [ 4739.816936] hid-generic 
0003:2101:8501.0013: hiddev0,hidraw4: USB HID v1.11 Device [Action Star USB 
HID] on usb-0000:02:00.0-1.1/input0
Nov 19 22:53:06 ads-120elmst-proxmox-1 kernel: [ 4741.055914] usb 9-2.2: reset 
low-speed USB device number 8 using xhci_hcd
noNov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.399598] vmbr0: port 
2(tap111i0) entered disabled state
Nov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.464740] input: Logitech 
USB Optical Mouse as 
/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-2/9-2.2/9-2.2:1.0/0003:046D:C077.0014/input/input24
Nov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.465021] hid-generic 
0003:046D:C077.0014: input,hidraw5: USB HID v1.11 Mouse [Logitech USB Optical 
Mouse] on usb-0000:02:00.0-2.2/input0
Nov 19 22:53:38 ads-120elmst-proxmox-1 kernel: [ 4773.482748] hid-generic 
0003:2101:8501.0015: hiddev0,hidraw6: USB HID v1.11 Device [Action Star USB 
HID] on usb-0000:02:00.0-2.1/input0



Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.661898] device tap110i0 
entered promiscuous mode
Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668725] vmbr0: port 
2(tap110i0) entered blocking state
Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668726] vmbr0: port 
2(tap110i0) entered disabled state
Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668800] vmbr0: port 
2(tap110i0) entered blocking state
Nov 19 22:54:32 ads-120elmst-proxmox-1 kernel: [ 4827.668801] vmbr0: port 
2(tap110i0) entered forwarding state
Nov 19 22:54:33 ads-120elmst-proxmox-1 kernel: [ 4827.964488] vmbr0: port 
2(tap110i0) entered disabled state
Nov 19 22:54:38 ads-120elmst-proxmox-1 kernel: [ 4833.565853] usb 9-2.2: USB 
disconnect, device number 8
Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.038318] usb 9-2.2: new 
low-speed USB device number 9 using xhci_hcd
Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148988] usb 9-2.2: New 
USB device found, idVendor=046d, idProduct=c077
Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148991] usb 9-2.2: New 
USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148993] usb 9-2.2: 
Product: USB Optical Mouse
Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.148994] usb 9-2.2: 
Manufacturer: Logitech
Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.153141] input: Logitech 
USB Optical Mouse as 
/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-2/9-2.2/9-2.2:1.0/0003:046D:C077.0016/input/input25
Nov 19 22:54:40 ads-120elmst-proxmox-1 kernel: [ 4835.153521] hid-generic 
0003:046D:C077.0016: input,hidraw5: USB HID v1.11 Mouse [Logitech USB Optical 
Mouse] on usb-0000:02:00.0-2.2/input0
Nov 19 22:55:42 ads-120elmst-proxmox-1 kernel: [ 4897.081151] usb 9-2.2: USB 
disconnect, device number 9
Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.582198] usb 9-2.2: new 
low-speed USB device number 10 using xhci_hcd
Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694029] usb 9-2.2: New 
USB device found, idVendor=046d, idProduct=c077
Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694031] usb 9-2.2: New 
USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694033] usb 9-2.2: 
Product: USB Optical Mouse
Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.694035] usb 9-2.2: 
Manufacturer: Logitech
Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.701760] input: Logitech 
USB Optical Mouse as 
/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb9/9-2/9-2.2/9-2.2:1.0/0003:046D:C077.0017/input/input26
Nov 19 22:55:43 ads-120elmst-proxmox-1 kernel: [ 4898.702086] hid-generic 
0003:046D:C077.0017: input,hidraw5: USB HID v1.11 Mouse [Logitech USB Optical 
Mouse] on usb-0000:02:00.0-2.2/input0
Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.677507] device tap110i0 
entered promiscuous mode
Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688302] vmbr0: port 
2(tap110i0) entered blocking state
Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688304] vmbr0: port 
2(tap110i0) entered disabled state
Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688387] vmbr0: port 
2(tap110i0) entered blocking state
Nov 19 22:55:48 ads-120elmst-proxmox-1 kernel: [ 4903.688388] vmbr0: port 
2(tap110i0) entered forwarding state
Nov 19 22:55:50 ads-120elmst-proxmox-1 kernel: [ 4904.869968] vfio_ecap_init: 
0000:04:00.0 hiding ecap 0x19@0x900
Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.874808] device tap111i0 
entered promiscuous mode
Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880827] vmbr0: port 
3(tap111i0) entered blocking state
Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880829] vmbr0: port 
3(tap111i0) entered disabled state
Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880895] vmbr0: port 
3(tap111i0) entered blocking state
Nov 19 22:55:52 ads-120elmst-proxmox-1 kernel: [ 4906.880896] vmbr0: port 
3(tap111i0) entered forwarding state
Nov 19 22:55:53 ads-120elmst-proxmox-1 kernel: [ 4908.513876] usb 9-1.1: reset 
high-speed USB device number 4 using xhci_hcd
Nov 19 22:55:54 ads-120elmst-proxmox-1 kernel: [ 4908.938243] usb 9-1.2: reset 
full-speed USB device number 5 using xhci_hcd
Nov 19 22:55:54 ads-120elmst-proxmox-1 kernel: [ 4909.297968] usb 9-1.4: reset 
full-speed USB device number 7 using xhci_hcd
Nov 19 22:55:55 ads-120elmst-proxmox-1 kernel: [ 4909.781433] usb 9-1.2: reset 
full-speed USB device number 5 using xhci_hcd
Nov 19 22:55:55 ads-120elmst-proxmox-1 kernel: [ 4910.176352] usb 9-1.4: reset 
full-speed USB device number 7 using xhci_hcd
Nov 19 22:56:06 ads-120elmst-proxmox-1 kernel: [ 4921.624815] vfio_ecap_init: 
0000:05:00.0 hiding ecap 0x1e@0x258
Nov 19 22:56:06 ads-120elmst-proxmox-1 kernel: [ 4921.624824] vfio_ecap_init: 
0000:05:00.0 hiding ecap 0x19@0x900
Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.086319] usb 9-1.4: reset 
full-speed USB device number 7 using xhci_hcd
Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.282305] usb 9-1.2: reset 
full-speed USB device number 5 using xhci_hcd
Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.480909] usb 9-1.1: reset 
high-speed USB device number 4 using xhci_hcd
Nov 19 22:56:09 ads-120elmst-proxmox-1 kernel: [ 4924.666275] usb 9-1.2: reset 
full-speed USB device number 5 using xhci_hcd
Nov 19 22:56:10 ads-120elmst-proxmox-1 kernel: [ 4924.855264] usb 9-1.4: reset 
full-speed USB device number 7 using xhci_hcd
Nov 19 22:56:10 ads-120elmst-proxmox-1 kernel: [ 4925.076862] usb 9-1.1: reset 
high-speed USB device number 4 using xhci_hcd
Nov 19 22:56:10 ads-120elmst-proxmox-1 kernel: [ 4925.262218] usb 9-1.2: reset 
full-speed USB device number 5 using xhci_hcd



There are more USB resets than I remember last time I tried this.

Thanks again for the response, and if you have any other thoughts on the matter 
I'd love to hear them.

-Brian
  

----- Original Message -----
From: "vfio-users-request" <vfio-users-requ...@redhat.com>
To: "vfio-users" <vfio-users@redhat.com>
Sent: Sunday, November 19, 2017 12:00:11 PM
Subject: vfio-users Digest, Vol 28, Issue 16

Message: 4
Date: Sun, 19 Nov 2017 08:53:32 +0000
From: Zir Blazer <zir_bla...@hotmail.com>
To: "vfio-users@redhat.com" <vfio-users@redhat.com>
Subject: Re: [vfio-users] GPU driver crashes when running a second VM
        if either VM has a virtual disk stored on physical media other than
        the root disk. Tested on three X58 chipset MBs
Message-ID:
        
<cy4pr15mb146404edebb7caa99b74972ef3...@cy4pr15mb1464.namprd15.prod.outlook.com>
        
Content-Type: text/plain; charset="iso-8859-1"

The Nehalem era X58, 5500 and 5520 Chipsets had a notoriously broken Interrupt 
Remapping implementation:
https://support.citrix.com/article/CTX136517 (Love that symptoms list)
https://www.netiq.com/support/kb/doc.php?id=7014344
https://serverfault.com/questions/745593/does-disabling-vt-d-and-interrupt-remapping-break-msi-x



Interrupt Remapping was directly related to some x2APIC, MSI-X (Not sure if 
MSI) and IOMMU features which obviously on X58 platforms don't work as 
intended. You can try to force disabling them with Kernel Parameters 
(intremap=off and something else to force old xAPIC) and see if it improves. 
Google around also the X2APIC Disable Bit in the ACPI DMAR Table (I recall that 
I wrote something related to it). VFIO had also an allow_unsafe_interrupts=1 
option that was also related to Nehalem broken Chipsets.
Basically, you have trying to use early era Hardware that was quite buggy, so 
is expected that Passthrough would be problematic. Have fun!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://www.redhat.com/archives/vfio-users/attachments/20171119/c5f4ae66/attachment.html>

_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

Reply via email to