On Thu, Apr 20, 2023 at 11:43 AM Sergey Bugaev <buga...@gmail.com> wrote: > > Normally, in static builds, the first code that runs is _start, in e.g. > sysdeps/x86_64/start.S, which quickly calls __libc_start_main, passing > it the argv etc. Among the first things __libc_start_main does is > initializing the tunables (based on env), then CPU features, and then > calls _dl_relocate_static_pie (). Specifically, this runs ifunc > resolvers to pick, based on the CPU features discovered earlier, the > most suitable implementation of "string" functions such as memcpy. > > Before that point, calling memcpy (or other ifunc-resolved functions) > will not work. > > In the Hurd port, things are more complex. In order to get argv/env for > our process, glibc normally needs to do an RPC to the exec server, > unless our args/env are already located on the stack (which is what > happens to bootstrap processes spawned by GNU Mach). Fetching our > argv/env from the exec server has to be done before the call to > __libc_start_main, since we need to know what our argv/env are to pass > them to __libc_start_main. > > On the other hand, the implementation of the RPC (and other initial > setup needed on the Hurd before __libc_start_main can be run) is not > very trivial. In particular, it may (and on x86_64, will) use memcpy. > But as described above, calling memcpy before __libc_start_main can not > work, since the GOT entry for it is not yet initialized at that point. > > Work around this by pre-filling the GOT entry with the baseline version > of memcpy, __memcpy_sse2_unaligned. This makes it possible for early > calls to memcpy to just work. Once _dl_relocate_static_pie () is called, > the baseline version will get replaced with the most suitable one, and > that's what subsequent calls of memcpy are going to call. > > Also, apply the same treatment to __stpncpy, which can also be used by > the RPCs (see mig_strncpy.c), and is an ifunc-resolved function on both > x86_64 and i386. > > Tested on x86_64-gnu (!). > > Signed-off-by: Sergey Bugaev <buga...@gmail.com> > --- > > Please tell me: > > * if the approach is at all sane > * if there's a better way to do this without hardcoding > "__memcpy_sse2_unaligned" > * are the GOT entries for indirect functions supposed to be statically > initialized to anything (in the binary)? if yes, why? if not, why is > PROGBITS and not NOBITS? > * am I doing all this _GLOBAL_OFFSET_TABLE_, @GOT, @GOTOFF, @GOTPCREL > correctly? > * should there be a !PIC version as well? does the GOT exist under > !PIC (to access indirect functions), and if it does then how do I > access it? it would seem gcc just generates a direct $function even > for indirect functions in this case. > > sysdeps/mach/hurd/i386/static-start.S | 7 +++++++ > sysdeps/mach/hurd/x86_64/static-start.S | 8 ++++++++ > 2 files changed, 15 insertions(+) > > diff --git a/sysdeps/mach/hurd/i386/static-start.S > b/sysdeps/mach/hurd/i386/static-start.S > index c5d12645..1b1ae559 100644 > --- a/sysdeps/mach/hurd/i386/static-start.S > +++ b/sysdeps/mach/hurd/i386/static-start.S > @@ -19,6 +19,13 @@ > .text > .globl _start > _start: > +#ifdef PIC > + call __x86.get_pc_thunk.bx > + addl $_GLOBAL_OFFSET_TABLE_, %ebx > + leal __stpncpy_ia32@GOTOFF(%ebx), %eax > + movl %eax, __stpncpy@GOT(%ebx) > +#endif > + > call _hurd_stack_setup > xorl %edx, %edx > jmp _start1 > diff --git a/sysdeps/mach/hurd/x86_64/static-start.S > b/sysdeps/mach/hurd/x86_64/static-start.S > index 982d3d52..81b3c0ac 100644 > --- a/sysdeps/mach/hurd/x86_64/static-start.S > +++ b/sysdeps/mach/hurd/x86_64/static-start.S > @@ -19,6 +19,14 @@ > .text > .globl _start > _start: > + > +#ifdef PIC > + leaq __memcpy_sse2_unaligned(%rip), %rax > + movq %rax, memcpy@GOTPCREL(%rip) > + leaq __stpncpy_sse2_unaligned(%rip), %rax > + movq %rax, __stpncpy@GOTPCREL(%rip) > +#endif > + > call _hurd_stack_setup > xorq %rdx, %rdx > jmp _start1 > -- > 2.40.0 >
Doesn't it disable IFUNC for memcpy and stpncpy? -- H.J.