On Sun, Dec 7, 2025 at 3:31 AM Florian Weimer <[email protected]> wrote:
>
> * H. J. Lu:
>
> > For TLS calls:
> >
> > 1. UNSPEC_TLS_GD:
> >
> >   (parallel [
> >     (set (reg:DI 0 ax)
> >        (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> >                 (const_int 0 [0])))
> >     (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> >                 (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
> >     (clobber (reg:DI 5 di))])
> >
> > 2. UNSPEC_TLS_LD_BASE:
> >
> >   (parallel [
> >     (set (reg:DI 0 ax)
> >        (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> >                 (const_int 0 [0])))
> >     (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
> >
> > 3. UNSPEC_TLSDESC:
> >
> >   (parallel [
> >      (set (reg/f:DI 104)
> >          (plus:DI (unspec:DI [
> >                      (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
> >                        (reg:DI 114)
> >                        (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
> >                     (const:DI (unspec:DI [
> >                                (symbol_ref:DI ("e") [flags 0x1a])
> >                             ] UNSPEC_DTPOFF))))
> >      (clobber (reg:CC 17 flags))])
> >
> >   (parallel [
> >     (set (reg:DI 101)
> >        (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> >                      (reg:DI 112)
> >                      (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
> >     (clobber (reg:CC 17 flags))])
> >
> > they return the same value for the same input value.  But multiple calls
> > with the same input value may be generated for simple programs like:
> >
> > void a(long *);
> > int b(void);
> > void c(void);
> > static __thread long e;
> > long
> > d(void)
> > {
> >   a(&e);
> >   if (b())
> >     c();
> >   return e;
> > }
> >
> > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
> > generated:
> >
> >       .type   d, @function
> > d:
> > .LFB0:
> >       .cfi_startproc
> >       pushq   %rbx
> >       .cfi_def_cfa_offset 16
> >       .cfi_offset 3, -16
> >       leaq    e@TLSDESC(%rip), %rbx
> >       movq    %rbx, %rax
> >       call    *e@TLSCALL(%rax)
> >       addq    %fs:0, %rax
> >       movq    %rax, %rdi
> >       call    a@PLT
> >       call    b@PLT
> >       testl   %eax, %eax
> >       jne     .L8
> >       movq    %rbx, %rax
> >       call    *e@TLSCALL(%rax)
> >       popq    %rbx
> >       .cfi_remember_state
> >       .cfi_def_cfa_offset 8
> >       movq    %fs:(%rax), %rax
> >       ret
> >       .p2align 4,,10
> >       .p2align 3
> > .L8:
> >       .cfi_restore_state
> >       call    c@PLT
> >       movq    %rbx, %rax
> >       call    *e@TLSCALL(%rax)
> >       popq    %rbx
> >       .cfi_def_cfa_offset 8
> >       movq    %fs:(%rax), %rax
> >       ret
> >       .cfi_endproc
> >
> > There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
> > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
> > extend it to also remove redundant TLS calls to generate:
> >
> > d:
> > .LFB0:
> >       .cfi_startproc
> >       pushq   %rbx
> >       .cfi_def_cfa_offset 16
> >       .cfi_offset 3, -16
> >       leaq    e@TLSDESC(%rip), %rax
> >       movq    %fs:0, %rdi
> >       call    *e@TLSCALL(%rax)
> >       addq    %rax, %rdi
> >       movq    %rax, %rbx
> >       call    a@PLT
> >       call    b@PLT
> >       testl   %eax, %eax
> >       jne     .L8
> >       movq    %fs:(%rbx), %rax
> >       popq    %rbx
> >       .cfi_remember_state
> >       .cfi_def_cfa_offset 8
> >       ret
> >       .p2align 4,,10
> >       .p2align 3
> > .L8:
> >       .cfi_restore_state
> >       call    c@PLT
> >       movq    %fs:(%rbx), %rax
> >       popq    %rbx
> >       .cfi_def_cfa_offset 8
> >       ret
> >       .cfi_endproc
> >
> > with only one "call *e@TLSCALL(%rax)".  This reduces the number of
> > __tls_get_addr calls in libgcc.a by 72%:
> >
> > __tls_get_addr calls     before         after
> > libgcc.a                 868            243
>
> While this is certainly nice, it does not make it harder to resume
> coroutines/fibers on a different from what they were suspended on.  I
> do not know to what extent that was previously supported for
> global-dynamic TLS.  I recall there be other caching going on (and
> certainly for errno because  __errno_location is declared const).
>
> If this impacts software like QEMU, is there a way to get back the old
> behavior?

We need to let the compiler know that some const functions aren't really
const.

-- 
H.J.

Reply via email to