On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by GCCs older than GCC 4.9.4:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 continue to work even if vector instructions are used by functions called from __tls_get_addr, which assumes 16-byte stack alignment as specified by x86-64 psABI. We are considering to add an alternative interface, ___tls_get_addr, to glibc, which doesn't realign stack. Compilers, which properly align stack for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, if ___tls_get_addr is available. Any comments? -- H.J.