On Sun, Dec 7, 2025 at 3:31 AM Florian Weimer <[email protected]> wrote: > > * H. J. Lu: > > > For TLS calls: > > > > 1. UNSPEC_TLS_GD: > > > > (parallel [ > > (set (reg:DI 0 ax) > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > (const_int 0 [0]))) > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > (reg/f:DI 7 sp)] UNSPEC_TLS_GD) > > (clobber (reg:DI 5 di))]) > > > > 2. UNSPEC_TLS_LD_BASE: > > > > (parallel [ > > (set (reg:DI 0 ax) > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > (const_int 0 [0]))) > > (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) > > > > 3. UNSPEC_TLSDESC: > > > > (parallel [ > > (set (reg/f:DI 104) > > (plus:DI (unspec:DI [ > > (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) > > (reg:DI 114) > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC) > > (const:DI (unspec:DI [ > > (symbol_ref:DI ("e") [flags 0x1a]) > > ] UNSPEC_DTPOFF)))) > > (clobber (reg:CC 17 flags))]) > > > > (parallel [ > > (set (reg:DI 101) > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > (reg:DI 112) > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) > > (clobber (reg:CC 17 flags))]) > > > > they return the same value for the same input value. But multiple calls > > with the same input value may be generated for simple programs like: > > > > void a(long *); > > int b(void); > > void c(void); > > static __thread long e; > > long > > d(void) > > { > > a(&e); > > if (b()) > > c(); > > return e; > > } > > > > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are > > generated: > > > > .type d, @function > > d: > > .LFB0: > > .cfi_startproc > > pushq %rbx > > .cfi_def_cfa_offset 16 > > .cfi_offset 3, -16 > > leaq e@TLSDESC(%rip), %rbx > > movq %rbx, %rax > > call *e@TLSCALL(%rax) > > addq %fs:0, %rax > > movq %rax, %rdi > > call a@PLT > > call b@PLT > > testl %eax, %eax > > jne .L8 > > movq %rbx, %rax > > call *e@TLSCALL(%rax) > > popq %rbx > > .cfi_remember_state > > .cfi_def_cfa_offset 8 > > movq %fs:(%rax), %rax > > ret > > .p2align 4,,10 > > .p2align 3 > > .L8: > > .cfi_restore_state > > call c@PLT > > movq %rbx, %rax > > call *e@TLSCALL(%rax) > > popq %rbx > > .cfi_def_cfa_offset 8 > > movq %fs:(%rax), %rax > > ret > > .cfi_endproc > > > > There are 3 "call *e@TLSCALL(%rax)". They all return the same value. > > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit, > > extend it to also remove redundant TLS calls to generate: > > > > d: > > .LFB0: > > .cfi_startproc > > pushq %rbx > > .cfi_def_cfa_offset 16 > > .cfi_offset 3, -16 > > leaq e@TLSDESC(%rip), %rax > > movq %fs:0, %rdi > > call *e@TLSCALL(%rax) > > addq %rax, %rdi > > movq %rax, %rbx > > call a@PLT > > call b@PLT > > testl %eax, %eax > > jne .L8 > > movq %fs:(%rbx), %rax > > popq %rbx > > .cfi_remember_state > > .cfi_def_cfa_offset 8 > > ret > > .p2align 4,,10 > > .p2align 3 > > .L8: > > .cfi_restore_state > > call c@PLT > > movq %fs:(%rbx), %rax > > popq %rbx > > .cfi_def_cfa_offset 8 > > ret > > .cfi_endproc > > > > with only one "call *e@TLSCALL(%rax)". This reduces the number of > > __tls_get_addr calls in libgcc.a by 72%: > > > > __tls_get_addr calls before after > > libgcc.a 868 243 > > While this is certainly nice, it does not make it harder to resume > coroutines/fibers on a different from what they were suspended on. I > do not know to what extent that was previously supported for > global-dynamic TLS. I recall there be other caching going on (and > certainly for errno because __errno_location is declared const). > > If this impacts software like QEMU, is there a way to get back the old > behavior?
We need to let the compiler know that some const functions aren't really const. -- H.J.
