https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881
--- Comment #81 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Julian Waters from comment #75) > Any feedback on the new patch? I would propose you legitimize TLS address using get_thread_pointer (as is the case with Eric's patch). Generic optimizers are then able to optimize the access to the symbol and later rewrite the address to a TLS named address space. Please consider this testcase (very relevant on linux, I don't know about Windows): --cut here-- extern __thread int i[8]; int foo (void) { return i[2] + i[4]; } --cut here-- Using get_thread_pointer, the above is expanded into: (insn 5 2 6 2 (set (reg:DI 102) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:DI ("i") [flags 0x60] <var_decl 0x7fbe95c10c60 i>) ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:11 95 {*movdi_internal} (nil)) (insn 6 5 7 2 (set (reg:DI 103) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:DI ("i") [flags 0x60] <var_decl 0x7fbe95c10c60 i>) ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:18 95 {*movdi_internal} (nil)) (insn 7 6 8 2 (set (reg:SI 104) (mem/c:SI (plus:DI (plus:DI (unspec:DI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:DI 102)) (const_int 8 [0x8])) [1 i[2]+0 S4 A32])) "tls.c":5:15 96 {*movsi_internal} (nil)) (insn 8 7 9 2 (set (reg:SI 105) (mem/c:SI (plus:DI (plus:DI (unspec:DI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:DI 103)) (const_int 16 [0x10])) [1 i[4]+0 S4 A32])) "tls.c":5:15 96 {*movsi_internal} (nil)) (insn 9 8 10 2 (parallel [ (set (reg:SI 101 [ _4 ]) (plus:SI (reg:SI 104) (reg:SI 105))) (clobber (reg:CC 17 flags)) ]) "tls.c":5:15 283 {*addsi_1} (expr_list:REG_EQUAL (plus:SI (mem/c:SI (plus:DI (plus:DI (unspec:DI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:DI 102)) (const_int 8 [0x8])) [1 i[2]+0 S4 A32]) (mem/c:SI (plus:DI (plus:DI (unspec:DI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:DI 103)) (const_int 16 [0x10])) [1 i[4]+0 S4 A32])) (nil))) Please note how UNSPEC_TP forms legitimate address in (insn 9). Generic optimizers optimize the above to the following RTX sequence: (insn 5 2 7 2 (set (reg:DI 102) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:DI ("i") [flags 0x60] <var_decl 0x7fbe95c10c60 i>) ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:11 95 {*movdi_internal} (nil)) (note 7 5 8 2 NOTE_INSN_DELETED) (insn 8 7 9 2 (set (reg:SI 105 [ i[4] ]) (mem/c:SI (plus:DI (plus:DI (unspec:DI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:DI 102)) (const_int 16 [0x10])) [1 i[4]+0 S4 A32])) "tls.c":5:15 96 {*movsi_internal} (nil)) (insn 9 8 14 2 (parallel [ (set (reg:SI 101 [ _4 ]) (plus:SI (mem/c:SI (plus:DI (plus:DI (unspec:DI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:DI 102)) (const_int 8 [0x8])) [1 i[2]+0 S4 A32]) (reg:SI 105 [ i[4] ]))) (clobber (reg:CC 17 flags)) ]) "tls.c":5:15 283 {*addsi_1} (expr_list:REG_DEAD (reg:DI 102) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:SI 105 [ i[4] ]) (nil))))) And the above sequence is later rewritten to use TLS named address space (please note AS1 in the address): (insn 5 2 7 2 (set (reg:DI 102) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:DI ("i") [flags 0x60] <var_decl 0x7fbe95c10c60 i>) ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:11 95 {*movdi_internal} (nil)) (note 7 5 18 2 NOTE_INSN_DELETED) (insn 18 7 19 2 (set (reg:SI 105 [ i[4] ]) (mem/c:SI (plus:DI (reg:DI 102) (const_int 16 [0x10])) [1 i[4]+0 S4 A32 AS1])) "tls.c":5:15 -1 (nil)) (insn 19 18 14 2 (parallel [ (set (reg:SI 101 [ _4 ]) (plus:SI (mem/c:SI (plus:DI (reg:DI 102) (const_int 8 [0x8])) [1 i[2]+0 S4 A32 AS1]) (reg:SI 105 [ i[4] ]))) (clobber (reg:CC 17 flags)) ]) "tls.c":5:15 -1 (nil)) and this results in the optimal assembly: foo: movq i@gottpoff(%rip), %rdx movl %fs:16(%rdx), %eax addl %fs:8(%rdx), %eax ret BTW, adding -mno-tls-direct-seg-refs to compile flags (that avoids optimizations with segment register in the address) results in: foo: movq %fs:0, %rcx movq i@gottpoff(%rip), %rdx movl 16(%rcx,%rdx), %eax addl 8(%rcx,%rdx), %eax ret