Re: E820 memory allocation issue on Threadripper platforms

Branden Sherrell Mon, 24 Jun 2024 05:42:45 -0700

Hi all,

I recently found this mailing list thread when searching for information on a 
related issue regarding conflicting E820 on a Threadripper platform. For those 
interested in additional data points, I am using the ASUS WRX80E-SAGE SE Wifi 
II motherboard that presents the following E820 to Xen:


(XEN) EFI RAM map:
(XEN)  [0000000000000000, 0000000000000fff] (reserved)
(XEN)  [0000000000001000, 000000000008ffff] (usable)
(XEN)  [0000000000090000, 0000000000090fff] (reserved)
(XEN)  [0000000000091000, 000000000009ffff] (usable)
(XEN)  [00000000000a0000, 00000000000fffff] (reserved)
(XEN)  [0000000000100000, 0000000003ffffff] (usable)
(XEN)  [0000000004000000, 0000000004020fff] (ACPI NVS)
(XEN)  [0000000004021000, 0000000009df1fff] (usable)
(XEN)  [0000000009df2000, 0000000009ffffff] (reserved)
(XEN)  [000000000a000000, 00000000b5b04fff] (usable)
(XEN)  [00000000b5b05000, 00000000b8cd3fff] (reserved)
(XEN)  [00000000b8cd4000, 00000000b9064fff] (ACPI data)
(XEN)  [00000000b9065000, 00000000b942afff] (ACPI NVS)
(XEN)  [00000000b942b000, 00000000bb1fefff] (reserved)
(XEN)  [00000000bb1ff000, 00000000bbffffff] (usable)
(XEN)  [00000000bc000000, 00000000bfffffff] (reserved)
(XEN)  [00000000c1100000, 00000000c1100fff] (reserved)
(XEN)  [00000000e0000000, 00000000efffffff] (reserved)
(XEN)  [00000000f1280000, 00000000f1280fff] (reserved)
(XEN)  [00000000f2200000, 00000000f22fffff] (reserved)
(XEN)  [00000000f2380000, 00000000f2380fff] (reserved)
(XEN)  [00000000f2400000, 00000000f24fffff] (reserved)
(XEN)  [00000000f3680000, 00000000f3680fff] (reserved)
(XEN)  [00000000fea00000, 00000000feafffff] (reserved)
(XEN)  [00000000fec00000, 00000000fec00fff] (reserved)
(XEN)  [00000000fec10000, 00000000fec10fff] (reserved)
(XEN)  [00000000fed00000, 00000000fed00fff] (reserved)
(XEN)  [00000000fed40000, 00000000fed44fff] (reserved)
(XEN)  [00000000fed80000, 00000000fed8ffff] (reserved)
(XEN)  [00000000fedc2000, 00000000fedcffff] (reserved)
(XEN)  [00000000fedd4000, 00000000fedd5fff] (reserved)
(XEN)  [00000000ff000000, 00000000ffffffff] (reserved)
(XEN)  [0000000100000000, 000000703f0fffff] (usable)
(XEN)  [000000703f100000, 000000703fffffff] (reserved)

And of course the default physical link address of the x86_64 kernel is 16MiB 
which clearly conflicts with the EfiACPIMemoryNVS memory starting at 0x4000000. 
On latest Debian (12.5.0, bookworm) the decompressed kernel is more than 60MiB, 
so it obviously overflows into the adjacent region. I can also confirm that 
loading the Debian kernel at 2MiB also works as expected. Debian is also built 
with CONFIG_RELOCATABLE=y, so it should be capable of being loaded with this 
new feature in Xen. 

I see the link at this ticket was implemented and committed (dfc9fab0) on April 
8, 2024 but it appears to not have made its way into the latest (4.18) Xen 
release. Though there seem to be more recent commits cherry picked into that 
branch. When is this fix expected to make it into a release?

Branden.

> On Jan 17, 2024, at 7:54 AM, Roger Pau Monné <roger....@citrix.com> wrote:
> 
> On Wed, Jan 17, 2024 at 11:40:20AM +0100, Jan Beulich wrote:
>> On 17.01.2024 11:13, Roger Pau Monné wrote:
>>> On Wed, Jan 17, 2024 at 09:46:27AM +0100, Jan Beulich wrote:
>>>> Whereas I assume the native kernel can deal with that as long as
>>>> it's built with CONFIG_RELOCATABLE=y. I don't think we want to
>>>> get into the business of interpreting the kernel's internal
>>>> representation of the relocations needed, so it's not really
>>>> clear to me what we might do in such a case. Perhaps the only way
>>>> is to signal to the kernel that it needs to apply relocations
>>>> itself (which in turn would require the kernel to signal to us
>>>> that it's capable of doing so). Cc-ing Roger in case he has any
>>>> neat idea.
>>> 
>>> Hm, no, not really.
>>> 
>>> We could do like multiboot2: the kernel provides us with some
>>> placement data (min/max addresses, alignment), and Xen let's the
>>> kernel deal with relocations itself.
>> 
>> Requiring the kernel's entry point to take a sufficiently different
>> flow then compared to how it's today, I expect.
> 
> Indeed, I would expect that.
> 
>>> Additionally we could support the kernel providing a section with the
>>> relocations and apply them from Xen, but that's likely hm, complicated
>>> at best, as I don't even know which kinds of relocations we would have
>>> to support.
>> 
>> If the kernel was properly linked to a PIE, there'd generally be only
>> one kind of relocation (per arch) that ought to need dealing with -
>> for x86-64 that's R_X86_64_RELATIVE iirc. Hence why (I suppose) they
>> don't use ELF relocation structures (for being wastefully large), but
>> rather a more compact custom representation. Even without building PIE
>> (presumably in part not possible because of how per-CPU data needs
>> dealing with), they get away with handling just very few relocs (and
>> from looking at the reloc processing code I'm getting the impression
>> they mistreat R_X86_64_32 as being the same as R_X86_64_32S, when it
>> isn't; needing to get such quirks right is one more aspect of why I
>> think we should leave relocation handling to the kernel).
> 
> Would have to look into more detail, but I think leaving any relocs
> for the OS to perform would be my initial approach.
> 
>>> I'm not sure how Linux deals with this in the bare metal case, are
>>> relocations done after decompressing and before jumping into the entry
>>> point?
>> 
>> That's how it was last time I looked, yes.
> 
> I've created a gitlab ticket for it:
> 
> https://gitlab.com/xen-project/xen/-/issues/180
> 
> So that we don't forget, as I don't have time to work into this right
> now, but I think it's important enough that we don't forget.
> 
> For PV it's a bit more unclear how we want to deal with it, as it's
> IMO a specific Linux behavior that makes it fail to boot.
> 
> Roger.
> 
>

Re: E820 memory allocation issue on Threadripper platforms

Reply via email to