I would like to report on a hack that I created to successfully use vfio-pci to pass through a boot GPU. The short TL;DR summary is that the BOOTFB framebuffer memory region seems to cause a "BAR <n>: can't reserve [mem <...>]" error, and this can be hackily worked around by calling __release_region on the BOOTFB framebuffer. I was told by someone on IRC to send this hack to this list.
My system setup is as follows: I have a Xeon E5-2630 v4 on an Asrock X99 Extreme6 motherboard. The GPU I am attempting to pass through is an NVIDIA GTX 1080 plugged into the slot closest to the CPU. There is a second GPU, an AMD R5 240 OEM (Oland) being used as the "initial" GPU for Linux ("Initial" in this case means that the text consoles and the X graphical login appear on the monitor connected to this GPU. After logging in, additional commands are run to either run a VM or run a new X server using the NVIDIA GPU.). Each GPU has separate monitor cables connected to them - there is no attempt to somehow forward the output from one GPU to another. Linux is booted using UEFI, not BIOS boot. The CSM is disabled. The UEFI splash and the GRUB bootloader display using the NVIDIA GPU. There does not appear to be an option to change the boot GPU. However, Linux is configured to display its output on the AMD GPU by a) only describing the AMD GPU in xorg.conf and b) passing "video=simplefb:off" on the command line as well as putting radeon in the initrd so that it can load before the nvidia driver does. I am running Debian sid with kernel 4.6. I activate the vfio-pci drivers manually by writing to /sys/bus/pci/drivers/vfio-pci/new_id and then unbinding the existing driver and binding vfio-pci. This actually works most of the time (more on this later). When I initially (without my hack) try to launch a qemu-kvm guest (using virt-manager; guest OS is Windows 10; guest is booting via OVMF; guest is using i440fx), the host kernel log gets flooded with an error "vfio-pci 0000:04:00.0: BAR 1: can't reserve [mem 0xc0000000-0xcfffffff 64bit pref]". Examining /proc/iomem shows the memory region vfio-pci is trying to claim overlaps with a memory region named BOOTFB which is apparently the UEFI framebuffer (despite the fact that simplefb is disabled, apparently this memory region is still created). As a really terrible hack, I wrote a kernel module that calls "__release_region(&iomem_resource, <start of bootfb>, <size of bootfb>)". This fixed the issue for me, and I was successfully able to pass through the boot GPU to the guest. The source code of this hacky kernel module is below. It is used by running "insmod forcefully-remove-bootfb.ko bootfb_start=<addr> bootfb_end=<addr>" using addresses found from /proc/iomem. The module is then immediately unloaded with rmmod. (The kernel module can't find BOOTFB by itself because I couldn't and didn't bother to figure out how to actually traverse iomem_resource from a kernel module. The resource_lock lock doesn't seem to be accessible from modules.) Regarding activating the vfio-pci drivers, I actually do not have the nvidia/snd_hda_intel drivers blacklisted. I allow them to load normally on boot and unbind them when I run a VM. I also attempt to rebind the normal drivers after shutting down the VM. The idea is that I can either run a Windows VM using the NVIDIA GPU, or I can start a second X server using the NVIDIA GPU and a separate xorg.nv.conf, and I can switch between these two modes without rebooting the host (restarting (the second) X is still required). Most of the time, this actually works correctly. Occasionally however, the kernel will encounter a general protection fault, but this is an unrelated issue to this hack I am describing. A dump of various pieces of information follows (this probably isn't directly useful and is for reference only): $ lspci -nn <snip> 00:1b.0 Audio device [0403]: Intel Corporation C610/X99 series chipset HD Audio Controller [8086:8d20] (rev 05) <snip> 04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1) 04:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f0] (rev a1) <snip> 08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] 08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] [1002:aab0] <snip> $ uname -a Linux <hostname> 4.6.0-1-amd64 #1 SMP Debian 4.6.2-2 (2016-06-25) x86_64 GNU/Linux $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.6.0-1-amd64 root=UUID=<snip> ro rootflags=subvol=@ cgroup_enable=memory intremap=no_x2apic_optout intel_iommu=on video=simplefb:off quiet # cat /proc/iomem # before hack <snip> 60000000-6fffffff : PCI MMCONFIG 0000 [bus 00-ff] 60000000-6fffffff : reserved 70000000-fbffbfff : PCI Bus 0000:00 c0000000-d1ffffff : PCI Bus 0000:04 c0000000-cfffffff : 0000:04:00.0 c0000000-c086ffff : BOOTFB d0000000-d1ffffff : 0000:04:00.0 <snip> # cat /proc/iomem # after hack <snip> 60000000-6fffffff : PCI MMCONFIG 0000 [bus 00-ff] 60000000-6fffffff : reserved 70000000-fbffbfff : PCI Bus 0000:00 c0000000-d1ffffff : PCI Bus 0000:04 c0000000-cfffffff : 0000:04:00.0 d0000000-d1ffffff : 0000:04:00.0 <snip> ---------- full commands to prep for running VM ---------- sudo insmod forcefully-remove-bootfb.ko bootfb_start=0xc0000000 bootfb_end=0xc086ffff sudo rmmod forcefully_remove_bootfb echo "8086 8d20" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id # Intel HD Audio, unrelated to this hack echo "10de 1b80" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id echo "10de 10f0" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id echo "0000:00:1b.0" | sudo tee /sys/bus/pci/devices/0000\:00\:1b.0/driver/unbind # Intel HD Audio, unrelated to this hack echo "0000:04:00.0" | sudo tee /sys/bus/pci/devices/0000\:04\:00.0/driver/unbind echo "0000:04:00.1" | sudo tee /sys/bus/pci/devices/0000\:04\:00.1/driver/unbind echo "0000:00:1b.0" | sudo tee /sys/bus/pci/drivers/vfio-pci/bind echo "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/vfio-pci/bind echo "0000:04:00.1" | sudo tee /sys/bus/pci/drivers/vfio-pci/bind # Can run virt-manager and launch VM now ---------- full commands to switch back to Linux ---------- echo "0000:00:1b.0" | sudo tee /sys/bus/pci/devices/0000\:00\:1b.0/driver/unbind echo "0000:04:00.0" | sudo tee /sys/bus/pci/devices/0000\:04\:00.0/driver/unbind echo "0000:04:00.1" | sudo tee /sys/bus/pci/devices/0000\:04\:00.1/driver/unbind echo "0000:00:1b.0" | sudo tee /sys/bus/pci/drivers/snd_hda_intel/bind echo "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/nvidia/bind echo "0000:04:00.1" | sudo tee /sys/bus/pci/drivers/snd_hda_intel/bind ---------- forcefully-remove-bootfb.c ---------- #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/resource_ext.h> static resource_size_t bootfb_start = 0; static resource_size_t bootfb_end = 0; static int __init remover_module_init(void) { printk(KERN_INFO "forcefully-remove-bootfb loaded\n"); if (sizeof(resource_size_t) != 8) { // lol printk(KERN_ERR "Herp derp what is a programming?\n"); } else { printk(KERN_INFO "forcefully-remove-bootfb 0x%llx-0x%llx\n", bootfb_start, bootfb_end); if (bootfb_start == 0 && bootfb_end == 0) { printk(KERN_ERR "forcefully-remove-bootfb needs addresses!\n"); } else { // Do the actual removal here __release_region(&iomem_resource, bootfb_start, bootfb_end - bootfb_start + 1); } } return 0; } static void __exit remover_module_exit(void) { printk(KERN_INFO "forcefully-remove-bootfb unloaded\n"); } module_init(remover_module_init); module_exit(remover_module_exit); module_param(bootfb_start, ullong, 0000); module_param(bootfb_end, ullong, 0000); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Robert Ou <r...@robertou.com>"); MODULE_DESCRIPTION("Forcefully removes BOOTFB I/O resource");