http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354
--- Comment #9 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-18 19:35:43 UTC --- As dvyuokv@ pointed out, -ftls-model=initial-exec improves the situation, but does not fully help. Experiment: % cat x.c __thread int a; int foo() { return a; } HORRIBLE: -fPIC -shared % gcc x.c -O2 -fPIC -shared -o x.so ; objdump -d x.so | grep foo.: -A 5 00000000000006e0 <foo>: 6e0: 66 48 8d 3d f0 08 20 lea 0x2008f0(%rip),%rdi # 200fd8 <_DYNAMIC+0x1b8> 6e7: 00 6e8: 66 66 48 e8 10 ff ff callq 600 <__tls_get_addr@plt> 6ef: ff 6f0: 8b 00 mov (%rax),%eax NOT-SO-BAD: -fPIC -shared -ftls-model=initial-exec % gcc x.c -O2 -fPIC -shared -o x.so -ftls-model=initial-exec ; objdump -d x.so | grep foo.: -A 5 0000000000000630 <foo>: 630: 48 8b 05 a9 09 20 00 mov 0x2009a9(%rip),%rax # 200fe0 <_DYNAMIC+0x1b8> 637: 64 8b 00 mov %fs:(%rax),%eax 63a: c3 retq GOOD: -fPIE % gcc -c x.c -O2 -fPIE -o x.o ; objdump -d x.o | grep foo.: -A 5 0000000000000000 <foo>: 0: 64 8b 04 25 00 00 00 mov %fs:0x0,%eax 7: 00 8: c3 retq So, while -ftls-model=initial-exec improves the TLS performance, it is still 2x slower than -fPIE. For tsan, which does this for *every* memory access in the original program, this will cost 5%-10% slowdown. For our users this is a big deal, so they will link the static library whenever possible. Which default is used in gcc -- I don't care that much.