On Wed, Jun 10, 2015 at 01:07:17PM +0200, Laszlo Ersek wrote: > On 06/10/15 12:06, Marcel Apfelbaum wrote: > > On 06/09/2015 11:34 PM, Laszlo Ersek wrote: > >> On 06/07/15 11:23, Michael S. Tsirkin wrote: > >>> On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote: > >>>> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 > >>>> PCI > >>>> Bus driver globally signals the firmware that PCI enumeration and > >>>> resource > >>>> allocation have completed. At this point QEMU regenerates the ACPI > >>>> payload > >>>> in an fw_cfg read callback, and this is when the PXB's _CRS gets > >>>> populated. > >>>> > >>>> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is > >>>> clear in > >>>> the root bus's command register, *unlike* under SeaBIOS. The > >>>> consequences > >>>> unfold as follows: > >>>> > >>>> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one, > >>>> because pci_update_mappings() --> pci_bar_address() calculated it as > >>>> PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear. > >>>> > >>>> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to > >>>> the _CRS, *despite* having been programmed in PCI config space. > >>>> > >>>> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main > >>>> root bus's DWordMemory descriptor. > >>>> > >>>> - Guest OSes (Linux and Windows alike) notice the pre-programmed > >>>> SHPC BAR > >>>> within the PXB's config space, and notice that it conflicts with the > >>>> main root bus's memory resource descriptors. Linux reports > >>>> > >>>> pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100) > >>>> pci 0000:04:00.0: BAR 0: trying firmware assignment [mem > >>>> 0x88200000-0x882000ff 64bit] > >>>> pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts > >>>> with PCI Bus 0000:00 [mem > >>>> 0x88200000-0xfebfffff] > >>>> > >>>> While Windows Server 2012 R2 reports > >>>> > >>>> > >>>> https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx > >>>> > >>>> This device cannot find enough free resources that it can use. > >>>> If you > >>>> want to use this device, you will need to disable one of the other > >>>> devices on this system. (Code 12) > >>>> > >>>> (This issue was apparently encountered earlier, see: > >>>> > >>>> > >>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html > >>>> > >>>> and the current hole-punching logic in build_crs() and build_ssdt() is > >>>> probably supposed to remedy exactly that problem -- however, for > >>>> OVMF they > >>>> don't work, because at the end of the PCI enumeration and resource > >>>> allocation, which cues the ACPI linker/loader client, the command > >>>> register > >>>> is clear.) > >>>> > >>>> The solution is to fetch the BAR addresses from PCI config space > >>>> directly, > >>>> for the purposes of build_crs(), regardless of the PCI command register > >>>> and any MemoryRegion state. > >>>> > >>>> Example MMIO maps for a 2GB guest: > >>>> > >>>> SeaBIOS: > >>>> > >>>> main: 0x80000000..0xFC000000 / 0x7C000000 > >>>> pxb: 0xFC000000..0xFC200000 / 0x00200000 > >>>> main: 0xFC200000..0xFC213000 / 0x00013000 > >>>> pxb: 0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR > >>>> main: 0xFC213100..0xFEA00000 / 0x027ECF00 > >>>> pxb: 0xFEA00000..0xFEC00000 / 0x00200000 > >>>> > >>>> OVMF, without the fix: > >>>> > >>>> main: 0x80000000..0x88100000 / 0x08100000 > >>>> pxb: 0x88100000..0x88200000 / 0x00100000 > >>>> main: 0x88200000..0xFEC00000 / 0x76A00000 > >>>> > >>>> OVMF, with the fix: > >>>> > >>>> main: 0x80000000..0x88100000 / 0x08100000 > >>>> pxb: 0x88100000..0x88200000 / 0x00100000 > >>>> pxb: 0x88200000..0x88200100 / 0x00000100 <- SHPC BAR > >>>> main: 0x88200100..0xFEC00000 / 0x769FFF00 > >>>> > >>>> (Note that under OVMF the PCI enumerator includes requests for > >>>> prefetchable memory in the nonprefetchable memory pool -- separate > >>>> windows > >>>> for nonprefetchable and prefetchable memory are not supported (yet) -- > >>>> that's why there's one fewer pxb range in the corrected OVMF case > >>>> than in > >>>> the SeaBIOS case.) > >>>> > >>>> Cc: Marcel Apfelbaum <mar...@redhat.com> > >>>> Cc: Michael S. Tsirkin <m...@redhat.com> > >>>> Signed-off-by: Laszlo Ersek <ler...@redhat.com> > >>> > >>> This is problematic - disabled BAR values have no meaning according to > >>> the PCI spec. > >>> > >>> It might be best to add a property to just disable shpc in the bridge so > >>> no devices reside directly behind the pxb? > >> > >> I started looking into this. > >> > >> The property itself doesn't seem terribly hard (there's already an "msi" > >> property which I can take as an example). Making stuff conditional on > >> this new "shpc" property looked feasible in the beginning, however I > >> qucikly ran into two issues: > >> > >> - Migration. > >> > >> One idea would be to keep the SHPC setup around at all times, and > >> just not expose the PCI bar to the guest (same as in Marcel's hack > >> [1]). I didn't like this, although it would keep the migration stream > >> intact. > >> > >> The other idea is to omit even the shpc_init() call when SHPC is > >> disabled. I like this, but it would require breaking migration > >> compatibility. Both "minimum_version_id" and "version_id" would have > >> to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE() > >> field should be replaced with a subsection (dependent on the new > >> "shpc" flag). > >> > >> Seems sweaty but doable. > >> > >> - Hotplug handling. > >> > >> This is the deal breaker. The new "shpc" flag can affect *instances* > >> of the bridge, but SHPC is a class-level trait. > >> pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to > >> SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself > >> as TYPE_HOTPLUG_HANDLER.
Right, but you can simply add code to fail plug requests (and ignore unplug requests) after checking this property. > >> > >> This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class > >> into a base class and a derived class. Only the derived class would > >> support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be > >> diverted to the new base class (without SHPC), in pxb_dev_initfn(), > >> from "pci-bridge". (The derived class would preserve the name > >> "pci-bridge".) > >> > >> Consequences for migration are unclear to me. Maybe the new derived > >> class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV) > >> would be migration-compatible with the current class. > >> > >> If not, I could create the "basic" bridge class as a standalone one, > >> without interrupt / MSI / SHPC / hotplug support, and PXB would use > >> that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so > >> this would be quite easy; it woduln't duplicate a lot of code, and > >> would not affect preexistent migration at all. On the other hand, > >> people might not like that the base class functionality were > >> duplicated, instead of inherited. > > Hi Laszlo, > > > > Can you please check that the above hack [1] solves the problem? > > If it works and there is no much code duplication, the latest idea > > seems to me the right way to go: A specific PCI-2-PCI bridge. > > Also we can always reduce duplication by moving common code in utility > > methods :) > > I did (almost) the same with the PCIBus, creating a PXB version of it. > > > > Now I am back from my PTO and I can help. Let me know if you want me to > > handle > > this issue or any other way I can assist. > > Thanks. I'll prototype a separate bridge class for this then. Hopefully > I can post something before the end of this week. :) > > Cheers > Laszlo > > > > > Thanks, > > Marcel > > > >> > >> I've managed such a base/descendant class split once before > >> (splitting fw_cfg into the IO and MMIO mapped variants) -- with help > >> of course -- so perhaps I could try it again, if that's the > >> preference. > >> > >> [1] > >> https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html > >> > >> Thoughts? > > > >> > >> Thanks, > >> Laszlo > >> > >> > >>>> --- > >>>> hw/i386/acpi-build.c | 4 ++-- > >>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > >>>> index b71e942..60d4f75 100644 > >>>> --- a/hw/i386/acpi-build.c > >>>> +++ b/hw/i386/acpi-build.c > >>>> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host, > >>>> for (i = 0; i < PCI_NUM_REGIONS; i++) { > >>>> PCIIORegion *r = &dev->io_regions[i]; > >>>> > >>>> - range_base = r->addr; > >>>> - range_limit = r->addr + r->size - 1; > >>>> + range_base = pci_bar_address(dev, i, r->type, r->size, > >>>> false); > >>>> + range_limit = range_base + r->size - 1; > >>>> > >>>> /* > >>>> * Work-around for old bioses > >>>> -- > >>>> 1.8.3.1 > >> > >