http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-19 08:54:47 UTC --- I bet 9.5% or more of that is due to the PLT call. The thing is, even when you have initial-exec TLS model code, if you link it into an executable and the referenced TLS variable is in the executable, the linker TLS transitions optimization changes the IE model into LE model, so instead of something like: mov 0x2009a9(%rip),%rax mov %fs:(%rax),%eax you'll end up with mov $-4,%rax mov %fs:(%rax),%eax or so (compared to mov %fs:-4,%eax if it was local-exec model from the beginning). Given the amount of code in __tsan_read8, I seriously doubt it is noticeable. So, please compare libtsan built with -fPIC -ftls-model=initial-exec with libtsan built with -fPIE (-ftls-model=local-exec). The former will not require any special hacks and will work just fine even with shared libraries, the latter won't.