On 04/03/19 11:49, Igor Mammedov wrote: > On Tue, 2 Apr 2019 17:38:26 +0200 > Laszlo Ersek <ler...@redhat.com> wrote: > >> On 04/02/19 17:29, Auger Eric wrote: >>> Hi Laszlo, >>> >>> On 4/1/19 3:07 PM, Laszlo Ersek wrote: >>>> On 03/29/19 14:56, Auger Eric wrote: >>>>> Hi Ard, >>>>> >>>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.au...@redhat.com> wrote: >>>>>>> >>>>>>> Hi Shameer, >>>>>>> >>>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Auger Eric [mailto:eric.au...@redhat.com] >>>>>>>>> Sent: 29 March 2019 09:32 >>>>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; >>>>>>>>> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com; >>>>>>>>> peter.mayd...@linaro.org; shannon.zha...@gmail.com; >>>>>>>>> sa...@linux.intel.com; sebastien.bo...@intel.com >>>>>>>>> Cc: Linuxarm <linux...@huawei.com>; xuwei (O) <xuw...@huawei.com>; >>>>>>>>> Laszlo Ersek <ler...@redhat.com>; Ard Biesheuvel >>>>>>>>> <ard.biesheu...@linaro.org>; Leif Lindholm <leif.lindh...@arm.com> >>>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature >>>>>>>>> "fdt" >>>>>>>>> >>>>>>>>> Hi Shameer, >>>>>>>>> >>>>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>>>> >>>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI >>>>>>>>> mode. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>>>> device-memory support is added for DT boot. >>>>>>>>> I am concerned about the fact that in dt mode, by default, you won't >>>>>>>>> see >>>>>>>>> any PCDIMM nodes. >>>>>>>>>> >>>>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates >>>>>>>>> whether >>>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at >>>>>>>>> this >>>>>>>>> info. >>>>>>>> >>>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>>>> >>>>>>>> Also, to be more clear on what happens, >>>>>>>> >>>>>>>> Guest ACPI boot with "fdt=on" , >>>>>>>> >>>>>>>> From kernel log, >>>>>>>> >>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] >>>>>>>> --> This is the hotpluggable memory node from DT. >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>> [ 0.000000] Initmem setup node 0 [mem >>>>>>>> 0x0000000040000000-0x00000000ffffffff] >>>>>>>> >>>>>>>> >>>>>>>> Guest ACPI boot with "fdt=off" , >>>>>>>> >>>>>>>> [ 0.000000] Movable zone start for each node >>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>> [ 0.000000] Initmem setup node 0 [mem >>>>>>>> 0x0000000040000000-0x00000000bfffffff >>>>>>>> >>>>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>>>> >>>>>>> OK thank you for the example illustrating the concern. >>>>>>>> >>>>>>>> As you said, it could be possible to detect this node using SRAT in >>>>>>>> UEFI. >>>>>>> >>>>>>> Let's wait for EDK2 experts on this. >>>>>>> >>>>>> >>>>>> Happy to chime in, but I need a bit more context here. >>>>>> >>>>>> What is the problem, how does this path try to solve it, and why is >>>>>> that a bad idea? >>>>>> >>>>> Sure, sorry. >>>>> >>>>> This series: >>>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>>>> https://patchwork.kernel.org/cover/10863301/ >>>>> >>>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>>>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>>>> >>>>> We noticed that if we build the hotpluggable memory dt nodes on top of >>>>> the above ACPI tables, the DIMM slots are interpreted as not >>>>> hotpluggable memory slots (at least we think so). >>>>> >>>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>>>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>>>> >>>>> So in this series, we are forced to not generate the hotpluggable memory >>>>> dt nodes if we want the DIMM slots to be effectively recognized as >>>>> hotpluggable. >>>>> >>>>> Could you confirm we have a correct understanding of the EDK2 behaviour >>>>> and if so, would there be any solution for EDK2 to absorb both the DT >>>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>>>> >>>>> At qemu level, detecting we are booting in ACPI mode and purposely >>>>> removing the above mentioned DT nodes does not look straightforward. >>>> >>>> The firmware is not enlightened about the ACPI content that comes from >>>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >>>> as instructed through the ACPI linker/loader script, in order to install >>>> the ACPI content for the OS. No actual information is consumed by the >>>> firmware from the ACPI payload -- and that's a feature. >>>> >>>> The firmware does consume DT: >>>> >>>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >>>> the firmware (for its own information needs), and passed on to the OS. >>>> >>>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >>>> consumed only by the firmware (for its own information needs), and the >>>> DT is hidden from the OS. The OS gets only the ACPI content >>>> (processed/prepared as described above). >>>> >>>> >>>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >>>> base/size pairs in all the memory nodes in the DT. For each such base >>>> address that is currently tracked as "nonexistent" in the GCD memory >>>> space map, the driver currently adds the base/size range as "system >>>> memory". This in turn is reflected by the UEFI memmap that the OS gets >>>> to see as "conventional memory". >>>> >>>> If you need some memory ranges to show up as "special" in the UEFI >>>> memmap, then you need to distinguish them somehow from the "regular" >>>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >>>> firmware, so that it act upon the discriminator that you set in the DT. >>>> >>>> >>>> Now... from a brief look at the Platform Init and UEFI specs, my >>>> impression is that the hotpluggable (but presently not plugged) DIMM >>>> ranges should simply be *absent* from the UEFI memmap; is that correct? >>>> (I didn't check the ACPI spec, maybe it specifies the expected behavior >>>> in full.) If my impression is correct, then two options (alternatives) >>>> exist: >>>> >>>> (1) Hide the affected memory nodes -- or at least the affected base/size >>>> pairs -- from the DT, in case you boot without "-no-acpi" but with an >>>> external firmware loaded. Then the firmware will not expose those ranges >>>> as "conventional memory" in the UEFI memmap. This approach requires no >>>> changes to edk2. >>>> >>>> This option is precisely what Eric described up-thread, at >>>> <3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com">http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >>>> >>>>> in machvirt_init, there is firmware_loaded that tells you whether you >>>>> have a FW image. If this one is not set, you can induce dt. But if >>>>> there is a FW it can be either DT or ACPI booted. You also have the >>>>> acpi_enabled knob. >>>> >>>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >>>> "vl.c"). >>>> >>>> So, the condition for hiding the hotpluggable memory nodes in question >>>> from the DT is: >>> >>>> >>>> (aarch64 && firmware_loaded && acpi_enabled) >>> >>> Thanks a lot for all those inputs! >>> >>> I don't get why we test aarch64 in above condition (this was useful for >>> high ECAM range as the aarch32 FW was not supporting it but here, is it >>> still meaningful?) >> >> Sorry, I should have clarified that. Yes, it is meaningful: >> >> While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a >> 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but >> not the reverse, on ARM.) So if you run the 32-bit build of the >> ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with >> the OS is the DT. >> >> This "bitness distinction" is implemented in the firmware already. If >> you hid the memory nodes from the DT under the condition >> >> (!aarch64 && firmware_loaded && acpi_enabled) >> >> then the nodes would not be seen by the OS at all (because >> "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and >> all the OS can ever get is DT). > > It's getting tricky and I don't like a bit that we are trying to carter > 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has > a valid about guessing on QEMU side (that's usually a source of problem > in the future). > > Perhaps we should reconsider and think about marking hotplugbbale RAM > in DT and let firmware to exclude it from memory map.
I'm fine either way. (I'm glad to continue discussing either option; that shouldn't be taken as a preference on my end.) With option (2), please consider the new version dependency between QEMU and the firmware -- this may or may not affect migration. (Thinking about migration is difficult, so I'll leave that to you all :) ) Thanks Laszlo >>>> (2) Invent and set an "ignore me, firmware" property for the >>>> hotpluggable memory nodes in the DT, and update the firmware to honor >>>> that property. >>>> >>>> Thanks >>>> Laszlo >>>> >> >