On Sun, Dec 7, 2025 at 10:41 AM H.J. Lu <[email protected]> wrote: > > On Sun, Dec 7, 2025 at 3:31 AM Florian Weimer <[email protected]> wrote: > > > > * H. J. Lu: > > > > > For TLS calls: > > > > > > 1. UNSPEC_TLS_GD: > > > > > > (parallel [ > > > (set (reg:DI 0 ax) > > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > > (const_int 0 [0]))) > > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > > (reg/f:DI 7 sp)] UNSPEC_TLS_GD) > > > (clobber (reg:DI 5 di))]) > > > > > > 2. UNSPEC_TLS_LD_BASE: > > > > > > (parallel [ > > > (set (reg:DI 0 ax) > > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > > (const_int 0 [0]))) > > > (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) > > > > > > 3. UNSPEC_TLSDESC: > > > > > > (parallel [ > > > (set (reg/f:DI 104) > > > (plus:DI (unspec:DI [ > > > (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) > > > (reg:DI 114) > > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC) > > > (const:DI (unspec:DI [ > > > (symbol_ref:DI ("e") [flags 0x1a]) > > > ] UNSPEC_DTPOFF)))) > > > (clobber (reg:CC 17 flags))]) > > > > > > (parallel [ > > > (set (reg:DI 101) > > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > > (reg:DI 112) > > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) > > > (clobber (reg:CC 17 flags))]) > > > > > > they return the same value for the same input value. But multiple calls > > > with the same input value may be generated for simple programs like: > > > > > > void a(long *); > > > int b(void); > > > void c(void); > > > static __thread long e; > > > long > > > d(void) > > > { > > > a(&e); > > > if (b()) > > > c(); > > > return e; > > > } > > > > > > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are > > > generated: > > > > > > .type d, @function > > > d: > > > .LFB0: > > > .cfi_startproc > > > pushq %rbx > > > .cfi_def_cfa_offset 16 > > > .cfi_offset 3, -16 > > > leaq e@TLSDESC(%rip), %rbx > > > movq %rbx, %rax > > > call *e@TLSCALL(%rax) > > > addq %fs:0, %rax > > > movq %rax, %rdi > > > call a@PLT > > > call b@PLT > > > testl %eax, %eax > > > jne .L8 > > > movq %rbx, %rax > > > call *e@TLSCALL(%rax) > > > popq %rbx > > > .cfi_remember_state > > > .cfi_def_cfa_offset 8 > > > movq %fs:(%rax), %rax > > > ret > > > .p2align 4,,10 > > > .p2align 3 > > > .L8: > > > .cfi_restore_state > > > call c@PLT > > > movq %rbx, %rax > > > call *e@TLSCALL(%rax) > > > popq %rbx > > > .cfi_def_cfa_offset 8 > > > movq %fs:(%rax), %rax > > > ret > > > .cfi_endproc > > > > > > There are 3 "call *e@TLSCALL(%rax)". They all return the same value. > > > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit, > > > extend it to also remove redundant TLS calls to generate: > > > > > > d: > > > .LFB0: > > > .cfi_startproc > > > pushq %rbx > > > .cfi_def_cfa_offset 16 > > > .cfi_offset 3, -16 > > > leaq e@TLSDESC(%rip), %rax > > > movq %fs:0, %rdi > > > call *e@TLSCALL(%rax) > > > addq %rax, %rdi > > > movq %rax, %rbx > > > call a@PLT > > > call b@PLT > > > testl %eax, %eax > > > jne .L8 > > > movq %fs:(%rbx), %rax > > > popq %rbx > > > .cfi_remember_state > > > .cfi_def_cfa_offset 8 > > > ret > > > .p2align 4,,10 > > > .p2align 3 > > > .L8: > > > .cfi_restore_state > > > call c@PLT > > > movq %fs:(%rbx), %rax > > > popq %rbx > > > .cfi_def_cfa_offset 8 > > > ret > > > .cfi_endproc > > > > > > with only one "call *e@TLSCALL(%rax)". This reduces the number of > > > __tls_get_addr calls in libgcc.a by 72%: > > > > > > __tls_get_addr calls before after > > > libgcc.a 868 243 > > > > While this is certainly nice, it does not make it harder to resume > > coroutines/fibers on a different from what they were suspended on. I > > do not know to what extent that was previously supported for > > global-dynamic TLS. I recall there be other caching going on (and > > certainly for errno because __errno_location is declared const). > > > > If this impacts software like QEMU, is there a way to get back the old > > behavior? > > We need to let the compiler know that some const functions aren't really > const. >
Something like returns_different_thread function attribute. -- H.J.
