When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD opregion to the guest but libxl does not grant the guest permission to access the mapped memory region. This results in a crash of the i915.ko kernel module in a Linux HVM guest when it needs to access the IGD opregion:
Oct 23 11:36:33 domU kernel: Call Trace: Oct 23 11:36:33 domU kernel: ? idr_alloc+0x39/0x70 Oct 23 11:36:33 domU kernel: drm_get_last_vbltimestamp+0xaa/0xc0 [drm] Oct 23 11:36:33 domU kernel: drm_reset_vblank_timestamp+0x5b/0xd0 [drm] Oct 23 11:36:33 domU kernel: drm_crtc_vblank_on+0x7b/0x130 [drm] Oct 23 11:36:33 domU kernel: intel_modeset_setup_hw_state+0xbd4/0x1900 [i915] Oct 23 11:36:33 domU kernel: ? _cond_resched+0x16/0x40 Oct 23 11:36:33 domU kernel: ? ww_mutex_lock+0x15/0x80 Oct 23 11:36:33 domU kernel: intel_modeset_init_nogem+0x867/0x1d30 [i915] Oct 23 11:36:33 domU kernel: ? gen6_write32+0x4b/0x1c0 [i915] Oct 23 11:36:33 domU kernel: ? intel_irq_postinstall+0xb9/0x670 [i915] Oct 23 11:36:33 domU kernel: i915_driver_probe+0x5c2/0xc90 [i915] Oct 23 11:36:33 domU kernel: ? vga_switcheroo_client_probe_defer+0x1f/0x40 Oct 23 11:36:33 domU kernel: ? i915_pci_probe+0x3f/0x150 [i915] Oct 23 11:36:33 domU kernel: local_pci_probe+0x42/0x80 Oct 23 11:36:33 domU kernel: ? _cond_resched+0x16/0x40 Oct 23 11:36:33 domU kernel: pci_device_probe+0xfd/0x1b0 Oct 23 11:36:33 domU kernel: really_probe+0x222/0x480 Oct 23 11:36:33 domU kernel: driver_probe_device+0xe1/0x150 Oct 23 11:36:33 domU kernel: device_driver_attach+0xa1/0xb0 Oct 23 11:36:33 domU kernel: __driver_attach+0x8a/0x150 Oct 23 11:36:33 domU kernel: ? device_driver_attach+0xb0/0xb0 Oct 23 11:36:33 domU kernel: ? device_driver_attach+0xb0/0xb0 Oct 23 11:36:33 domU kernel: bus_for_each_dev+0x78/0xc0 Oct 23 11:36:33 domU kernel: bus_add_driver+0x12b/0x1e0 Oct 23 11:36:33 domU kernel: driver_register+0x8b/0xe0 Oct 23 11:36:33 domU kernel: ? 0xffffffffc06b8000 Oct 23 11:36:33 domU kernel: i915_init+0x5d/0x70 [i915] Oct 23 11:36:33 domU kernel: do_one_initcall+0x44/0x1d0 Oct 23 11:36:33 domU kernel: ? do_init_module+0x23/0x260 Oct 23 11:36:33 domU kernel: ? kmem_cache_alloc_trace+0xf5/0x200 Oct 23 11:36:33 domU kernel: do_init_module+0x5c/0x260 Oct 23 11:36:33 domU kernel: __do_sys_finit_module+0xb1/0x110 Oct 23 11:36:33 domU kernel: do_syscall_64+0x33/0x80 Oct 23 11:36:33 domU kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Oct 23 11:36:33 domU kernel: RIP: 0033:0x7f188e1aa9b9 This bug first appeared with commits abfb006f1ff4 and 0561e1f01e87 during the development of Xen 4.5 in 2014 and is present in all Xen versions 4.5 and higher. Currently, because of another bug in Qemu upstream, this crash can only be reproduced using the traditional Qemu device model, and of course it can only be reproduced on a system with an Intel IGD and compatible hardware and system BIOS that supports Intel VT-d. It also only occurs with Linux Intel graphics drivers (i915) in a Linux guest. It does not occur in a Windows guest with proprietary Windows Intel graphics drivers. This testing was done with Qemu traditional running as a process in dom0. The commit abfb006f1ff4 was intended to explicitly grant access to all needed I/O memory ranges so access requests by guests would not fail after commit 0561e1f01e87 was applied, but it failed to handle the case when the Intel IGD is being passed to an HVM guest for VGA passthrough. This patch grants access to the Intel IGD opregion if an Intel IGD is passed to an HVM guest and gfx_passthru is enabled in the xl.cfg guest configuration file, in addition to the other memory that was configured for access in commit abfb006f1ff4. The fix is implemented as follows: The first hunk of the patch adds two macros. These are the macros that determine the starting address and size of the Intel IGD opregion. PCI_INTEL_OPREGION matches the value in tools/firmware/hvmloader/pci_regs.h. IGD_OPREGION_PAGES matches the value in tools/firmware/hvmloader/config.h. These macros are used to correctly define the start address and size of the Intel IGD opregion. The second hunk is the new sysfs_dev_get_igd_opregion function, using the same coding style as the other sysfs_dev_get_xxx functions in the patched file. It returns the start address of the Intel IGD opregion from dom0's point of view if there are no errors, and it is called by code in the third and final hunk of the patch. The third hunk extends the libxl__grant_vga_iomem_permission function, which was a newly added function in one of the commits being fixed now (abfb006f1ff4). That function, in addition to what it did before, now checks if we have an Intel IGD and if so, it calls the new sysfs_dev_get_igd_opregion function to get the starting address of the memory to grant access to. This problem was discovered by building and testing versions of Xen 4.5-unstable until the aforementioned patches that broke passthrough of the Intel IGD to a Linux HVM guest were found. That alone, though, was not enough information to arrive at this fix. After identifying abfb006f1ff4 and 0561e1f01e87 as the commits that were causing the trouble, it was necessary to find out what memory was being denied that previously was allowed. By examining verbose logs from both Qemu and Xen, and the logs from a custom build of Xen that added a custom log entry to indicate the address of the memory that was being denied, it was possible to determine without doubt that the memory that was being denied was the Intel IGD opregion. Windows HVM guests are not affected by this issue, presumably because the proprietary Windows Intel graphics drivers do not need to access the Intel IGD opregion if for some reason it is inaccessible. Guidelines for maintaining this code: This code needs to be kept consistent with how hvmloader maps the Intel IGD opregion to the guest, how hvmloader and libxenlight detect an Intel IGD on the PCI bus, and how Qemu sets up the mapped IGD opregion in the guest and detects an Intel IGD. For example, all these components should agree on the size of the IGD opregion. The most important point for the future is accurate detection of the Intel IGD, which libxl__is_igd_vga_passthru currently provides. This patch does not modify that function, but it does use it. It will be important to ensure that function accurately detects an Intel IGD, because that is the only way we validate the contents of the Intel IGD opregion that we are permitting the guest to access. Currently, if we have a VGA device, the vendor is Intel, and the gfx_passthru option is enabled, we assume the contents of the memory we are mapping to and sharing with the guest is valid. The libxl__is_igd_vga_passthru function also reads the device id, but currently it assumes every Intel GPU device has an opregion. So, for example, if it is discovered that the newer discrete Intel GPUs do not have an opregion, the libxl__is_igd_vga_passthru function should be modified to return false for those devices. Fixes: abfb006f1ff4 (tools/libxl: explicitly grant access to needed I/O-memory ranges) Fixes: 0561e1f01e87 (xen/common: do not implicitly permit access to mapped I/O memory) Backport: 4.12+ Signed-off-by: Chuck Zmudzinski <brchu...@netscape.net> --- tools/libs/light/libxl_pci.c | 72 ++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 4bbbfe9f16..a4fc473de9 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -24,6 +24,8 @@ #define PCI_OPTIONS "msitranslate=%d,power_mgmt=%d" #define PCI_BDF_XSPATH "%04x-%02x-%02x-%01x" #define PCI_PT_QDEV_ID "pci-pt-%02x_%02x.%01x" +#define PCI_INTEL_OPREGION 0xfc +#define IGD_OPREGION_PAGES 3 static unsigned int pci_encode_bdf(libxl_device_pci *pci) { @@ -610,6 +612,45 @@ out: return ret; } +static uint32_t sysfs_dev_get_igd_opregion(libxl__gc *gc, + libxl_device_pci *pci) +{ + char *pci_device_config_path = + GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/config", + pci->domain, pci->bus, pci->dev, pci->func); + size_t read_items; + uint32_t igd_opregion; + uint32_t error = 0xffffffff; + + FILE *f = fopen(pci_device_config_path, "r"); + if (!f) { + LOGE(ERROR, + "pci device "PCI_BDF" does not have config attribute", + pci->domain, pci->bus, pci->dev, pci->func); + goto out; + } + if (fseek(f, PCI_INTEL_OPREGION, SEEK_SET)) { + LOGE(ERROR, + "cannot find igd opregion address of pci device "PCI_BDF, + pci->domain, pci->bus, pci->dev, pci->func); + goto out; + } + read_items = fread(&igd_opregion, 4, 1, f); + if (read_items != 1) { + LOGE(ERROR, + "cannot read igd opregion address of pci device "PCI_BDF, + pci->domain, pci->bus, pci->dev, pci->func); + goto out; + } + fclose(f); + return igd_opregion; + +out: + if (f) + fclose(f); + return error; +} + /* * Some devices may need some ways to work well. Here like IGD, * we have to pass a specific option to qemu. @@ -2531,6 +2572,37 @@ int libxl__grant_vga_iomem_permission(libxl__gc *gc, const uint32_t domid, domid, vga_iomem_start, (vga_iomem_start + 0x20 - 1)); return ret; } + + /* If this is an Intel IGD, allow access to the IGD opregion */ + if (!libxl__is_igd_vga_passthru(gc, d_config)) return 0; + + uint32_t igd_opregion = sysfs_dev_get_igd_opregion(gc, pci); + uint32_t error = 0xffffffff; + if (igd_opregion == error) break; + + vga_iomem_start = ( (uint64_t) igd_opregion ) >> XC_PAGE_SHIFT; + ret = xc_domain_iomem_permission(CTX->xch, stubdom_domid, + vga_iomem_start, + IGD_OPREGION_PAGES, 1); + if (ret < 0) { + LOGED(ERROR, domid, + "failed to give stubdom%d access to iomem range " + "%"PRIx64"-%"PRIx64" for IGD passthru", + stubdom_domid, vga_iomem_start, (vga_iomem_start + + IGD_OPREGION_PAGES - 1)); + return ret; + } + ret = xc_domain_iomem_permission(CTX->xch, domid, + vga_iomem_start, + IGD_OPREGION_PAGES, 1); + if (ret < 0) { + LOGED(ERROR, domid, + "failed to give dom%d access to iomem range " + "%"PRIx64"-%"PRIx64" for IGD passthru", + domid, vga_iomem_start, (vga_iomem_start + + IGD_OPREGION_PAGES - 1)); + return ret; + } break; } -- 2.35.1