On 01/05/21 01:35, Igor Mammedov wrote: > On Wed, 30 Dec 2020 16:22:08 -0500 > "Michael S. Tsirkin" <m...@redhat.com> wrote: > >> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote: >>> On Wed, 23 Dec 2020 17:08:31 +0800 >>> Jiahui Cen <cenjia...@huawei.com> wrote: >>> >>>> There may be some differences in pci resource assignment between guest os >>>> and firmware. >>>> >>>> Eg. A Bridge with Bus [d2] >>>> -+-[0000:d2]---01.0-[d3]----01.0 >>>> >>>> where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, >>>> non-pref) [size=256] >>>> [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) >>>> [size=128K] >>>> BAR4 (mem, 64-bit, pref) >>>> [size=64M] >>>> >>>> In EDK2, the Resource Map would be: >>>> PciBus: Resource Map for Bridge [D2|01|00] >>>> Type = PMem64; Base = 0x8004000000; Length = 0x4100000; >>>> Alignment = 0x3FFFFFF >>>> Base = 0x8004000000; Length = 0x4000000; Alignment = >>>> 0x3FFFFFF; Owner = PCI [D3|01|00:20] >>>> Base = 0x8008000000; Length = 0x20000; Alignment = >>>> 0x1FFFF; Owner = PCI [D3|01|00:10] >>>> Type = Mem64; Base = 0x8008100000; Length = 0x100; Alignment >>>> = 0xFFF >>>> It would use 0x4100000 to calculate the root bus's PMem64 resource >>>> window. >>>> >>>> While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate >>>> the PMem64 size, which would be 0x6000000. So kernel would try to >>>> allocate 0x6000000 from the PMem64 resource window, but since the >>>> window >>>> size is 0x4100000 as assigned by EDK2, the allocation would fail. >>>> >>>> The diffences could result in resource assignment failure. >>>> >>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration >>>> that firmware has done at boot time could handle the differences. >>> >>> I'm not sure about this one, >>> OS should able to reconfigure PCI resources according to what and where is >>> plugged >>> (and it even more true is hotplug is taken into account) >> >> spec says this: >> >> 0: No (The operating system must not ignore the PCI configuration that >> firmware has done >> at boot time. However, the operating system is free to configure the devices >> in this hierarchy >> that have not been configured by the firmware. There may be a reduced level >> of hot plug >> capability support in this hierarchy due to resource constraints. This >> situation is the same as >> the legacy situation where this _DSM is not provided.) >> 1: Yes (The operating system may ignore the PCI configuration that the >> firmware has done >> at boot time, and reconfigure/rebalance the resources in the hierarchy.) > I sort of convinced my self that's is just hotplug work might need to > implement reconfiguration > in guest kernel and maybe QEMU > > Though I have a question, > > 1. does it work for PC machine with current kernel, if so why? > 2. what it would take to make it work for arm/virt?
The Linux/arm64 guest deals with PCI resources differently for historical reasons. I was extremely confused by that as well, but Ard explained here: <https://www.redhat.com/archives/edk2-devel-archive/2020-December/msg01027.html>. (Do not be alarmed by Ard's initial statement "That is not going to work"; he later revised that here: <https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg05033.html>.) Thanks, Laszlo > >> and >> >> IMPLEMENTATION NOTE >> This _DSM function provides backwards compatibility on platforms that can >> run legacy operating >> systems. >> Operating systems for two different architectures (e.g., x86 and x64) can be >> installed on a platform. >> The firmware cannot distinguish the operating system in time to change the >> boot configuration of >> devices. Say for instance, an x86 operating system in non-PAE mode is >> installed on a system. The >> x86 operating system cannot access device resource space above 4 GiB. So the >> firmware is required >> to configure devices at boot time using addresses below 4 GiB. On the other >> hand, if an x64 >> operating system is installed on this system, it can access device resources >> above the 4 GiB so it does >> not want the firmware to constrain the resource assignment below 4 GiB that >> the firmware >> configures at boot time. It is not possible for the firmware to change this >> by the time it boots the >> operating system. Ignoring the configurations done by firmware at boot time >> will allow the >> operating system to push resource assignment using addresses above 4 GiB for >> an x64 operating >> system while constrain it to addresses below 4 GiB for an x86 operating >> system. >> >> so fundamentally, saying "1" here just means "you can ignore what >> firmware configured if you like". >> >> >> I have a different question though: our CRS etc is based on what >> firmware configured. Is that ok? Or is ACPI expected to somehow >> reconfigure itself when OS reconfigures devices? >> Think it's ok but could not find documentation either way. > > guest consume DSDT only at boot time, > reconfiguration can done later by PCI subsystem without > ACPI (at least it used to be so). > > However DSM is dynamic, > and maybe evaluated at runtime, > though I don't know if kernel would re-evaluate this feature bit after boot > > >> >> >>>> >>>> Signed-off-by: Jiahui Cen <cenjia...@huawei.com> >>>> --- >>>> hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++-- >>>> 1 file changed, 16 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c >>>> index 11b3db8f71..c189306599 100644 >>>> --- a/hw/pci-host/gpex-acpi.c >>>> +++ b/hw/pci-host/gpex-acpi.c >>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev) >>>> UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D"); >>>> ifctx = aml_if(aml_equal(aml_arg(0), UUID)); >>>> ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0))); >>>> - uint8_t byte_list[1] = {1}; >>>> - buf = aml_buffer(1, byte_list); >>>> + uint8_t byte_list[] = { >>>> + 0x1 << 0 /* support for functions other than function 0 >>>> */ | >>>> + 0x1 << 5 /* support for function 5 */ >>>> + }; >>>> + buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list); >>>> aml_append(ifctx1, aml_return(buf)); >>>> aml_append(ifctx, ifctx1); >>>> + >>>> + /* PCI Firmware Specification 3.1 >>>> + * 4.6.5. _DSM for Ignoring PCI Boot Configurations >>>> + */ >>>> + /* Arg2: Function Index: 5 */ >>>> + ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5))); >>>> + /* 0 - The operating system must not ignore the PCI configuration that >>>> + * firmware has done at boot time. >>>> + */ >>>> + aml_append(ifctx1, aml_return(aml_int(0))); >>>> + aml_append(ifctx, ifctx1); >>>> aml_append(method, ifctx); >>>> >>>> byte_list[0] = 0; >> >> > >