https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66944
Oleg Pronin <syber at crazypanda dot ru> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |syber at crazypanda dot ru --- Comment #4 from Oleg Pronin <syber at crazypanda dot ru> --- (In reply to Laurent Rineau from comment #3) > I got hit by this bug today. Do you know a workaround? Yes. Use static thread_local inside static member function. template <typename T> struct A { virtual void foo() { v(); } static Heavy& v () { static thread_local Heavy obj; return obj; } }; However i discovered that perfomance of getting thread_local variables somewhy depends on the class (Heavy) in this case, it's size and constructor complexity maybe i dunno, however i benchmarked the code above for some my class and got just only 100M/s calls. After that i changed the code above to: template <typename T> struct A { virtual void foo() { v(); } static Heavy* v () { static thread_local Heavy* ptr; if (!ptr) { static thread_local Heavy inst; ptr = &inst; } return ptr; } }; and got 300M/s calls. The point is that you should not allow code flow to go through static thread_local object declaration itself, only through POD pointer type. I don't know why thread_local is so ineffective however this workaround gave x3 speed for me in GCC 4.9 In clang, there is no thread_local bug, however it's speed is always slow (100M/s) and i could not speedup it with any of workarounds :(