https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881

--- Comment #81 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Julian Waters from comment #75)
> Any feedback on the new patch?

I would propose you legitimize TLS address using get_thread_pointer (as is the
case with Eric's patch). Generic optimizers are then able to optimize the
access to the symbol and later rewrite the address to a TLS named address
space.

Please consider this testcase (very relevant on linux, I don't know about
Windows):

--cut here--
extern __thread int i[8];

int foo (void)
{
  return i[2] + i[4];
}
--cut here--

Using get_thread_pointer, the above is expanded into:

(insn 5 2 6 2 (set (reg:DI 102)
        (mem/u/c:DI (const:DI (unspec:DI [
                        (symbol_ref:DI ("i") [flags 0x60]  <var_decl
0x7fbe95c10c60 i>)
                    ] UNSPEC_GOTNTPOFF)) [2  S8 A8])) "tls.c":5:11 95
{*movdi_internal}
     (nil))
(insn 6 5 7 2 (set (reg:DI 103)
        (mem/u/c:DI (const:DI (unspec:DI [
                        (symbol_ref:DI ("i") [flags 0x60]  <var_decl
0x7fbe95c10c60 i>)
                    ] UNSPEC_GOTNTPOFF)) [2  S8 A8])) "tls.c":5:18 95
{*movdi_internal}
     (nil))
(insn 7 6 8 2 (set (reg:SI 104)
        (mem/c:SI (plus:DI (plus:DI (unspec:DI [
                            (const_int 0 [0])
                        ] UNSPEC_TP)
                    (reg:DI 102))
                (const_int 8 [0x8])) [1 i[2]+0 S4 A32])) "tls.c":5:15 96
{*movsi_internal}
     (nil))
(insn 8 7 9 2 (set (reg:SI 105)
        (mem/c:SI (plus:DI (plus:DI (unspec:DI [
                            (const_int 0 [0])
                        ] UNSPEC_TP)
                    (reg:DI 103))
                (const_int 16 [0x10])) [1 i[4]+0 S4 A32])) "tls.c":5:15 96
{*movsi_internal}
     (nil))
(insn 9 8 10 2 (parallel [
            (set (reg:SI 101 [ _4 ])
                (plus:SI (reg:SI 104)
                    (reg:SI 105)))
            (clobber (reg:CC 17 flags))
        ]) "tls.c":5:15 283 {*addsi_1}
     (expr_list:REG_EQUAL (plus:SI (mem/c:SI (plus:DI (plus:DI (unspec:DI [
                                (const_int 0 [0])
                            ] UNSPEC_TP)
                        (reg:DI 102))
                    (const_int 8 [0x8])) [1 i[2]+0 S4 A32])
            (mem/c:SI (plus:DI (plus:DI (unspec:DI [
                                (const_int 0 [0])
                            ] UNSPEC_TP)
                        (reg:DI 103))
                    (const_int 16 [0x10])) [1 i[4]+0 S4 A32]))
        (nil)))

Please note how UNSPEC_TP forms legitimate address in (insn 9). Generic
optimizers optimize the above to the following RTX sequence:

(insn 5 2 7 2 (set (reg:DI 102)
        (mem/u/c:DI (const:DI (unspec:DI [
                        (symbol_ref:DI ("i") [flags 0x60]  <var_decl
0x7fbe95c10c60 i>)
                    ] UNSPEC_GOTNTPOFF)) [2  S8 A8])) "tls.c":5:11 95
{*movdi_internal}
     (nil))
(note 7 5 8 2 NOTE_INSN_DELETED)
(insn 8 7 9 2 (set (reg:SI 105 [ i[4] ])
        (mem/c:SI (plus:DI (plus:DI (unspec:DI [
                            (const_int 0 [0])
                        ] UNSPEC_TP)
                    (reg:DI 102))
                (const_int 16 [0x10])) [1 i[4]+0 S4 A32])) "tls.c":5:15 96
{*movsi_internal}
     (nil))
(insn 9 8 14 2 (parallel [
            (set (reg:SI 101 [ _4 ])
                (plus:SI (mem/c:SI (plus:DI (plus:DI (unspec:DI [
                                        (const_int 0 [0])
                                    ] UNSPEC_TP)
                                (reg:DI 102))
                            (const_int 8 [0x8])) [1 i[2]+0 S4 A32])
                    (reg:SI 105 [ i[4] ])))
            (clobber (reg:CC 17 flags))
        ]) "tls.c":5:15 283 {*addsi_1}
     (expr_list:REG_DEAD (reg:DI 102)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_DEAD (reg:SI 105 [ i[4] ])
                (nil)))))

And the above sequence is later rewritten to use TLS named address space
(please note AS1 in the address):

(insn 5 2 7 2 (set (reg:DI 102)
        (mem/u/c:DI (const:DI (unspec:DI [
                        (symbol_ref:DI ("i") [flags 0x60]  <var_decl
0x7fbe95c10c60 i>)
                    ] UNSPEC_GOTNTPOFF)) [2  S8 A8])) "tls.c":5:11 95
{*movdi_internal}
     (nil))
(note 7 5 18 2 NOTE_INSN_DELETED)
(insn 18 7 19 2 (set (reg:SI 105 [ i[4] ])
        (mem/c:SI (plus:DI (reg:DI 102)
                (const_int 16 [0x10])) [1 i[4]+0 S4 A32 AS1])) "tls.c":5:15 -1
     (nil))
(insn 19 18 14 2 (parallel [
            (set (reg:SI 101 [ _4 ])
                (plus:SI (mem/c:SI (plus:DI (reg:DI 102)
                            (const_int 8 [0x8])) [1 i[2]+0 S4 A32 AS1])
                    (reg:SI 105 [ i[4] ])))
            (clobber (reg:CC 17 flags))
        ]) "tls.c":5:15 -1
     (nil))

and this results in the optimal assembly:

foo:
        movq    i@gottpoff(%rip), %rdx
        movl    %fs:16(%rdx), %eax
        addl    %fs:8(%rdx), %eax
        ret

BTW, adding -mno-tls-direct-seg-refs to compile flags (that avoids
optimizations with segment register in the address) results in:

foo:
        movq    %fs:0, %rcx
        movq    i@gottpoff(%rip), %rdx
        movl    16(%rcx,%rdx), %eax
        addl    8(%rcx,%rdx), %eax
        ret

Reply via email to