Hi, On 02/25/16 12:57, Michael S. Tsirkin wrote: > ----- Forwarded message from Igor Mammedov <imamm...@redhat.com> ----- > > Date: Thu, 11 Feb 2016 16:16:05 +0100 > From: Igor Mammedov <imamm...@redhat.com> > To: "Michael S. Tsirkin" <m...@redhat.com> > To: ler...@redhat.com > Subject: on pci rebalancing > Message-ID: <20160211161605.0022e...@nial.brq.redhat.com> > In-Reply-To: <20160209131656-mutt-send-email-...@redhat.com> > >>>>> For PCI rebalance to work on Windows, one has to provide working PCI >>>>> driver >>>>> otherwise OS will ignore it when rebalancing happens and >>>>> might map something else over ignored BAR. >>>> >>>> Does it disable the BAR then? Or just move it elsewhere? >>> it doesn't, it just blindly ignores BARs existence and maps BAR of >>> another device with driver over it. >> >> Interesting. On classical PCI this is a forbidden configuration. >> Maybe we do something that confuses windows? >> Could you tell me how to reproduce this behaviour? > #cat > t << EOF > pci_update_mappings_del > pci_update_mappings_add > EOF > > #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > > wait till OS boots, note BARs programmed for ivshmem > in my case it was > 01:01.0 0,0xfe800000+0x100 > then execute script and watch pci_update_mappings* trace events > > # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" > $i | nc -U /tmp/m; sleep 5; done; > > hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > and then programs new BARs, where: > pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > creates overlapping BAR with ivshmem
Michael informed me of this on IRC (and forwarded this email to me). I hope to start a new thread with my response. (I also reedited the subject fully.) So, to summarize what I said on IRC first. The situation where firmware recognizes and enables a PCI device, hands control to the OS, and then the OS lacks a driver for the PCI device, is completely normal and expected. For UEFI specifically, I can name a general argument and a specific argument. The general argument is that actions that need to be taken in ExitBootServices() callbacks do not include clearing IO or MMIO decode bits in PCI device command registers. Command register manipulation happens when a PCI device driver (that conforms to the UEFI driver model) *binds* or *unbinds* a device. And unbinding a device is not possible in the ExitBootServices() callback, minimally because such callbacks are forbidden from modifying the memory map -- but unbinding would release allocated memory. So what we use such callbacks for is aborting in-flight, outstanding DMA-like transfers. Re-setting virtio devices is also an example (think outstanding receive requests for virtio-net). Now let's move on to the specific argument I mentioned above. The Graphics Output Protocol (GOP) is a UEFI abstraction that was specifically designed with the case in mind when the operating system doesn't have a display driver -- yet installed --, but the user obviously has to use the display somehow. The GOP is most frequently provided on top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a UEFI driver that drives a PCI device. In short, the driver provides the GOP on top of a PCI device. Now, the GOP is supposed to communicate the pixel format and the frame buffer base address for the currently active graphics mode to the software that consumes the GOP. This includes UEFI applications of course (think a boot loader putting up a splash screen or an anmiation), but importantly, the runtime OS is *also* supposed to inherit these characteristics from boot services time. The OS can then use simple unaccelerated MMIO writes to display things on the screen, until the users installs an accelerated driver. (Concrete example: this is why you can see *anything at all* on the screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, before installing the QXL WDDM driver in the guest.) Clearly, the frame buffer base address communicated through the GOP points into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), MMIO decoding were disabled for the PCI device that underlies the GOP, that would *completely* defeat the GOP design. The OS's attempt to poke at those MMIO addresses would be futile -- and in fact the OS has no idea what PCI device (if any) the framebuffer is supposed to be related to. This is the jurisdiction of the OS-level display driver -- if one exists and is installed. So, this is a Windows bug in my option. Just because there is no OS-level driver, a PCI device is fully expected to be decoding resources, if the firmware brought it up. --*-- Okay, so Michael asked me to try to reproduce the above with OVMF, and see what happens. Unfortunately I'm not really knowledgeable about ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF. (1) Please follow the instructions on Gerd's page <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" package. (2) Create a separate directory for testing. In this directory, run the following command: cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd Also create a disk image for your new guest, etc. (3) Use the following command line snippet to work with OVMF: qemu-system-x86_64 \ -machine accel=kvm \ -smp cpus=2 \ -m 2048 \ \ -debugcon file:ovmf.debug.log \ -global isa-debugcon.iobase=0x402 \ \ -device qxl-vga \ \ -drive if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd \ -drive if=pflash,format=raw,unit=1,file=myvars.fd \ \ [your options here] You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, network, and so on. Recommended: when you use the -device option to add the disk and the CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the "bootindex" property. OVMF will adhere to the boot order. It is recommended to set bootindex=0 for your main disk, bootindex=1 for your OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at first boot (with no OS installed) OVMF will boot the installer CD-ROM. Further boots (with the same command line) will boot the installed OS. Caveat: I never used the -snapshot option with OVMF virtual machines; it might or might not work. Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows running on OVMF many months ago, but I can't tell off-hand if it will work right now. Thanks Laszlo