https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82803
nsz at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nsz at gcc dot gnu.org --- Comment #4 from nsz at gcc dot gnu.org --- i run into the same issue: static __thread int x; static int *volatile p; void f(int c) { while (c--) p = &x; } with -xc -O2 -fPIC compiles to pushq %rbx leal -1(%rdi), %ebx .L10: leaq x@tlsld(%rip), %rdi call __tls_get_addr@PLT subl $1, %ebx addq $x@dtpoff, %rax movq %rax, p(%rip) cmpl $-1, %ebx jne .L10 popq %rbx ret note that with -funroll-loops the loop is .L46: leaq x@tlsld(%rip), %rdi call __tls_get_addr@PLT subl $8, %ebx addq $x@dtpoff, %rax movq %rax, p(%rip) movq %rax, p(%rip) movq %rax, p(%rip) movq %rax, p(%rip) movq %rax, p(%rip) movq %rax, p(%rip) movq %rax, p(%rip) movq %rax, p(%rip) cmpl $-1, %ebx jne .L46 so the loop unroller knows it only needs to compute the address once, but gcc fails to hoist it out of the loop. if i use a simple global, then the GOT access is hoisted, if i use an __attribute__((const)) function call then that is hoisted, only tls address computation is broken. the issue is not present with -m32 (i386 code gen), but it is present on e.g. aarch64 and powerpc64 and with tlsdesc -mtls-dialect=gnu2 (then it's the tlsdesc call that's in the loop instead of __tls_get_addr call).