On 2021/1/6 21:29, Igor Mammedov wrote:
> On Tue, 5 Jan 2021 09:53:49 +0800
> Jiahui Cen <cenjia...@huawei.com> wrote:
>
>> On 2021/1/5 8:35, Igor Mammedov wrote:
>>> On Wed, 30 Dec 2020 16:22:08 -0500
>>> "Michael S. Tsirkin" <m...@redhat.com> wrote:
>>>
>>>> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
>>>>> On Wed, 23 Dec 2020 17:08:31 +0800
>>>>> Jiahui Cen <cenjia...@huawei.com> wrote:
>>>>>
>>>>>> There may be some differences in pci resource assignment between guest os
>>>>>> and firmware.
>>>>>>
>>>>>> Eg. A Bridge with Bus [d2]
>>>>>> -+-[0000:d2]---01.0-[d3]----01.0
>>>>>>
>>>>>> where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit,
>>>>>> non-pref) [size=256]
>>>>>> [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref)
>>>>>> [size=128K]
>>>>>> BAR4 (mem, 64-bit, pref)
>>>>>> [size=64M]
>>>>>>
>>>>>> In EDK2, the Resource Map would be:
>>>>>> PciBus: Resource Map for Bridge [D2|01|00]
>>>>>> Type = PMem64; Base = 0x8004000000; Length = 0x4100000;
>>>>>> Alignment = 0x3FFFFFF
>>>>>> Base = 0x8004000000; Length = 0x4000000; Alignment =
>>>>>> 0x3FFFFFF; Owner = PCI [D3|01|00:20]
>>>>>> Base = 0x8008000000; Length = 0x20000; Alignment =
>>>>>> 0x1FFFF; Owner = PCI [D3|01|00:10]
>>>>>> Type = Mem64; Base = 0x8008100000; Length = 0x100;
>>>>>> Alignment = 0xFFF
>>>>>> It would use 0x4100000 to calculate the root bus's PMem64 resource
>>>>>> window.
>>>>>>
>>>>>> While in Linux, kernel will use 0x1FFFFFF as the alignment to
>>>>>> calculate
>>>>>> the PMem64 size, which would be 0x6000000. So kernel would try to
>>>>>> allocate 0x6000000 from the PMem64 resource window, but since the
>>>>>> window
>>>>>> size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>>>>
>>>>>> The diffences could result in resource assignment failure.
>>>>>>
>>>>>> Using _DSM #5 method to inform guest os not to ignore the PCI
>>>>>> configuration
>>>>>> that firmware has done at boot time could handle the differences.
>>>>>
>>>>> I'm not sure about this one,
>>>>> OS should able to reconfigure PCI resources according to what and where
>>>>> is plugged
>>>>> (and it even more true is hotplug is taken into account)
>>>>
>>>> spec says this:
>>>>
>>>> 0: No (The operating system must not ignore the PCI configuration that
>>>> firmware has done
>>>> at boot time. However, the operating system is free to configure the
>>>> devices in this hierarchy
>>>> that have not been configured by the firmware. There may be a reduced
>>>> level of hot plug
>>>> capability support in this hierarchy due to resource constraints. This
>>>> situation is the same as
>>>> the legacy situation where this _DSM is not provided.)
>>>> 1: Yes (The operating system may ignore the PCI configuration that the
>>>> firmware has done
>>>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
>>> I sort of convinced my self that's is just hotplug work might need to
>>> implement reconfiguration
>>> in guest kernel and maybe QEMU
>>>
>>> Though I have a question,
>>>
>>> 1. does it work for PC machine with current kernel, if so why?
>>> 2. what it would take to make it work for arm/virt?
>>>
>>
>> 1. For x86, it generally keeps the configuration by firmware,
>> so there is nothing wrong for PC machine.
>>
>> 2. We add DSM method in DSDT to inform guest to keep
>> firmware's configuration, just like x86.
>>
>>>> and
>>>>
>>>> IMPLEMENTATION NOTE
>>>> This _DSM function provides backwards compatibility on platforms that can
>>>> run legacy operating
>>>> systems.
>>>> Operating systems for two different architectures (e.g., x86 and x64) can
>>>> be installed on a platform.
>>>> The firmware cannot distinguish the operating system in time to change the
>>>> boot configuration of
>>>> devices. Say for instance, an x86 operating system in non-PAE mode is
>>>> installed on a system. The
>>>> x86 operating system cannot access device resource space above 4 GiB. So
>>>> the firmware is required
>>>> to configure devices at boot time using addresses below 4 GiB. On the
>>>> other hand, if an x64
>>>> operating system is installed on this system, it can access device
>>>> resources above the 4 GiB so it does
>>>> not want the firmware to constrain the resource assignment below 4 GiB
>>>> that the firmware
>>>> configures at boot time. It is not possible for the firmware to change
>>>> this by the time it boots the
>>>> operating system. Ignoring the configurations done by firmware at boot
>>>> time will allow the
>>>> operating system to push resource assignment using addresses above 4 GiB
>>>> for an x64 operating
>>>> system while constrain it to addresses below 4 GiB for an x86 operating
>>>> system.
>>>>
>>>> so fundamentally, saying "1" here just means "you can ignore what
>>>> firmware configured if you like".
>>>>
>>>>
>>>> I have a different question though: our CRS etc is based on what
>>>> firmware configured. Is that ok? Or is ACPI expected to somehow
>>>> reconfigure itself when OS reconfigures devices?
>>>> Think it's ok but could not find documentation either way.
>>>
>>> guest consume DSDT only at boot time,
>>> reconfiguration can done later by PCI subsystem without
>>> ACPI (at least it used to be so).
>>>
>>> However DSM is dynamic,
>>> and maybe evaluated at runtime,
>>> though I don't know if kernel would re-evaluate this feature bit after boot
>>>
>>
>> Seems kernel evaluates DSM only at boot time.
>
> Ok, lets respin this series without 5/8
> to avoid mixing unrelated changes in one series.
>
> We can think about 5/8 some more and return to it later if it proves hard to
> merge.
OK, I'll split patch [5/8] from this series and
send them separately.
Thanks for the discussion.
Thanks,
Jiahui
>
>>
>> Thanks,
>> Jiahui
>>
>>>
>>>>
>>>>
>>>>>>
>>>>>> Signed-off-by: Jiahui Cen <cenjia...@huawei.com>
>>>>>> ---
>>>>>> hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>>>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>>>>> index 11b3db8f71..c189306599 100644
>>>>>> --- a/hw/pci-host/gpex-acpi.c
>>>>>> +++ b/hw/pci-host/gpex-acpi.c
>>>>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>>>> UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>>>> ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>>>> ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>>>>> - uint8_t byte_list[1] = {1};
>>>>>> - buf = aml_buffer(1, byte_list);
>>>>>> + uint8_t byte_list[] = {
>>>>>> + 0x1 << 0 /* support for functions other than function 0
>>>>>> */ |
>>>>>> + 0x1 << 5 /* support for function 5 */
>>>>>> + };
>>>>>> + buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>>>> aml_append(ifctx1, aml_return(buf));
>>>>>> aml_append(ifctx, ifctx1);
>>>>>> +
>>>>>> + /* PCI Firmware Specification 3.1
>>>>>> + * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>>>>> + */
>>>>>> + /* Arg2: Function Index: 5 */
>>>>>> + ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>>>>> + /* 0 - The operating system must not ignore the PCI configuration
>>>>>> that
>>>>>> + * firmware has done at boot time.
>>>>>> + */
>>>>>> + aml_append(ifctx1, aml_return(aml_int(0)));
>>>>>> + aml_append(ifctx, ifctx1);
>>>>>> aml_append(method, ifctx);
>>>>>>
>>>>>> byte_list[0] = 0;
>>>>
>>>>
>>>
>>> .
>>>
>>
>
> .
>