On 12.12.2024 15:30, Jan Beulich wrote:
> On 12.12.2024 02:17, Andrew Cooper wrote:
>> (With the knowledge that this is a disputed Kconfig pattern, and will
>> need rebasing), the way I want this to work is simply:
>>
>> diff --git a/xen/Makefile b/xen/Makefile
>> index 0de0101fd0bf..5d0a88fb3c3f 100644
>> --- a/xen/Makefile
>> +++ b/xen/Makefile
>> @@ -434,6 +434,9 @@ endif
>>  
>>  ifeq ($(CONFIG_STACK_PROTECTOR),y)
>>  CFLAGS += -fstack-protector
>> +ifeq ($(CONFIG_X86),y)
>> +CFLAGS += -mstack-protector-guard=global
>> +endif
>>  else
>>  CFLAGS += -fno-stack-protector
>>  endif
>> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
>> index 9cdd04721afa..7951ca908b36 100644
>> --- a/xen/arch/x86/Kconfig
>> +++ b/xen/arch/x86/Kconfig
>> @@ -28,6 +28,7 @@ config X86
>>         select HAS_PCI_MSI
>>         select HAS_PIRQ
>>         select HAS_SCHED_GRANULARITY
>> +       select HAS_STACK_PROTECTOR if
>> $(cc-option,-mstack-protector-guard=global)
>>         select HAS_UBSAN
>>         select HAS_VMAP
>>         select HAS_VPCI if HVM
>>
>>
>>
>> Sadly, it doesn't build.  I get a handful of:
>>
>> prelink.o: in function `cmdline_parse':
>> /home/andrew/xen.git/xen/common/kernel.c:216:(.init.text+0x20f2): failed
>> to convert GOTPCREL relocation against '__stack_chk_guard'; relink with
>> --no-relax
>> /home/andrew/xen.git/xen/common/kernel.c:230:(.init.text+0x246f): failed
>> to convert GOTPCREL relocation against '__stack_chk_guard'; relink with
>> --no-relax
>>
>> which is more toolchain-whispering than I feel like doing tonight.
> 
> Imo the root of the problem is that the compiler doesn't itself mark
> __stack_chk_guard hidden (it does so for __stack_chk_fail, albeit only for
> 32-bit code), and hence finds it necessary to use @gotpcrel to access the
> variable. Even if the linker managed to relax all of these, it would then
> still be less efficient compared to direct RIP-relative accesses.
> 
> I also can't see how we might be able to override the compiler's internal
> declaration to mark it hidden (the same appears to be true for other items
> the declares internally, like the retpoline thunks or even strcmp() et al).
> Passing -fvisibility=hidden doesn't make a difference (just as another
> data point).
> 
> Playing with -fstack-protector* flavors, I observe:
> - -fstack-protector causing several failures, like you observed, oddly
>   enough exclusively from __init functions,
> - -fstack-protector-all and -fstack-protector-strong each causing a single
>   (but respectively different) failure, for apparently random non-__init
>   functions.
> Taking this together it very much smells like a linker issue. I'll see
> about checking there further.

The oddity with how many diags show up is down to internals of the linker.
It processes a single input section in full (continuing over this specific
type of error), but will stop processing afterwards if any such error was
encountered.

The issue itself is a wrong assumption in the linker: It believes that it
would only ever build small-model code when encountering this kind of
relocation, and when not linking a shared library or PIE. With this
assumption it converts the relocation resulting from @gotpcrel to
R_X86_64_32S (converting the MOV from GOT to MOV $imm), which of course
overflows when later trying to actually resolve it. What I'm yet to
understand is why it doesn't use R_X86_64_PC32 (also) in such a situation
(it does e.g. when building a shared library).

While so far I didn't try it, using --no-relax is presumably not an option,
as I expect that it'll leave us with a non-empty .got. Plus I didn't even
start looking into how the xen.efi linking would deal with the ELF-specific
gotpcrel relocs; the concept of GOT doesn't exist in PE/COFF, after all.

While the linker certainly wants fixing, I continue to think that getting
the compiler side right would yield the better overall result.

Jan

Reply via email to