On Sun, Dec 7, 2025 at 10:41 AM H.J. Lu <[email protected]> wrote:
>
> On Sun, Dec 7, 2025 at 3:31 AM Florian Weimer <[email protected]> wrote:
> >
> > * H. J. Lu:
> >
> > > For TLS calls:
> > >
> > > 1. UNSPEC_TLS_GD:
> > >
> > >   (parallel [
> > >     (set (reg:DI 0 ax)
> > >        (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> > >                 (const_int 0 [0])))
> > >     (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> > >                 (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
> > >     (clobber (reg:DI 5 di))])
> > >
> > > 2. UNSPEC_TLS_LD_BASE:
> > >
> > >   (parallel [
> > >     (set (reg:DI 0 ax)
> > >        (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> > >                 (const_int 0 [0])))
> > >     (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
> > >
> > > 3. UNSPEC_TLSDESC:
> > >
> > >   (parallel [
> > >      (set (reg/f:DI 104)
> > >          (plus:DI (unspec:DI [
> > >                      (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
> > >                        (reg:DI 114)
> > >                        (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
> > >                     (const:DI (unspec:DI [
> > >                                (symbol_ref:DI ("e") [flags 0x1a])
> > >                             ] UNSPEC_DTPOFF))))
> > >      (clobber (reg:CC 17 flags))])
> > >
> > >   (parallel [
> > >     (set (reg:DI 101)
> > >        (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> > >                      (reg:DI 112)
> > >                      (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
> > >     (clobber (reg:CC 17 flags))])
> > >
> > > they return the same value for the same input value.  But multiple calls
> > > with the same input value may be generated for simple programs like:
> > >
> > > void a(long *);
> > > int b(void);
> > > void c(void);
> > > static __thread long e;
> > > long
> > > d(void)
> > > {
> > >   a(&e);
> > >   if (b())
> > >     c();
> > >   return e;
> > > }
> > >
> > > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
> > > generated:
> > >
> > >       .type   d, @function
> > > d:
> > > .LFB0:
> > >       .cfi_startproc
> > >       pushq   %rbx
> > >       .cfi_def_cfa_offset 16
> > >       .cfi_offset 3, -16
> > >       leaq    e@TLSDESC(%rip), %rbx
> > >       movq    %rbx, %rax
> > >       call    *e@TLSCALL(%rax)
> > >       addq    %fs:0, %rax
> > >       movq    %rax, %rdi
> > >       call    a@PLT
> > >       call    b@PLT
> > >       testl   %eax, %eax
> > >       jne     .L8
> > >       movq    %rbx, %rax
> > >       call    *e@TLSCALL(%rax)
> > >       popq    %rbx
> > >       .cfi_remember_state
> > >       .cfi_def_cfa_offset 8
> > >       movq    %fs:(%rax), %rax
> > >       ret
> > >       .p2align 4,,10
> > >       .p2align 3
> > > .L8:
> > >       .cfi_restore_state
> > >       call    c@PLT
> > >       movq    %rbx, %rax
> > >       call    *e@TLSCALL(%rax)
> > >       popq    %rbx
> > >       .cfi_def_cfa_offset 8
> > >       movq    %fs:(%rax), %rax
> > >       ret
> > >       .cfi_endproc
> > >
> > > There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
> > > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
> > > extend it to also remove redundant TLS calls to generate:
> > >
> > > d:
> > > .LFB0:
> > >       .cfi_startproc
> > >       pushq   %rbx
> > >       .cfi_def_cfa_offset 16
> > >       .cfi_offset 3, -16
> > >       leaq    e@TLSDESC(%rip), %rax
> > >       movq    %fs:0, %rdi
> > >       call    *e@TLSCALL(%rax)
> > >       addq    %rax, %rdi
> > >       movq    %rax, %rbx
> > >       call    a@PLT
> > >       call    b@PLT
> > >       testl   %eax, %eax
> > >       jne     .L8
> > >       movq    %fs:(%rbx), %rax
> > >       popq    %rbx
> > >       .cfi_remember_state
> > >       .cfi_def_cfa_offset 8
> > >       ret
> > >       .p2align 4,,10
> > >       .p2align 3
> > > .L8:
> > >       .cfi_restore_state
> > >       call    c@PLT
> > >       movq    %fs:(%rbx), %rax
> > >       popq    %rbx
> > >       .cfi_def_cfa_offset 8
> > >       ret
> > >       .cfi_endproc
> > >
> > > with only one "call *e@TLSCALL(%rax)".  This reduces the number of
> > > __tls_get_addr calls in libgcc.a by 72%:
> > >
> > > __tls_get_addr calls     before         after
> > > libgcc.a                 868            243
> >
> > While this is certainly nice, it does not make it harder to resume
> > coroutines/fibers on a different from what they were suspended on.  I
> > do not know to what extent that was previously supported for
> > global-dynamic TLS.  I recall there be other caching going on (and
> > certainly for errno because  __errno_location is declared const).
> >
> > If this impacts software like QEMU, is there a way to get back the old
> > behavior?
>
> We need to let the compiler know that some const functions aren't really
> const.
>

Something like returns_different_thread function attribute.

-- 
H.J.

Reply via email to