https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881

--- Comment #83 from Julian Waters <tanksherman27 at gmail dot com> ---
Liu Hao: The registers it's using seem to be all over the place. Prior it was
using rdx for the gs:[88] load and rax for everything else, now it's either
using any register it can find, or using rdx to store the result of rdx+rax*8.
I have no idea why the resulting assembly is so different, but this could mean
the resulting program runs less efficiently

EDIT: Nevermind, it was because of rax being the return value register and the
thread local being an array

extern _Thread_local int local;

int get(void) {
    return local;
}

movl    _tls_index(%rip), %eax
movq    %gs:88, %rdx
movq    (%rdx,%rax,8), %rax
movl    local@secrel32(%rax), %eax

extern _Thread_local int local[8];

int get(void) {
    return local[2] + local[4];
}

movl    _tls_index(%rip), %eax
movq    %gs:88, %rdx
movq    (%rdx,%rax,8), %rdx
movl    16+local@secrel32(%rdx), %eax
addl    8+local@secrel32(%rdx), %eax

Uros: I see, I'll try to do so. I was mainly avoiding that to break less code
(I have a habit of doing that to anything I touch). Although, the resulting
assembly (Barring the register selection) already seems to be as compact as
possible for Windows, I'm not sure how using get_thread_pointer could make it
any more optimal. This is a genuinely curious question, not placing doubt on
whether get_thread_pointer can help optimize the resulting assembly

Reply via email to