https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66944

Oleg Pronin <syber at crazypanda dot ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |syber at crazypanda dot ru

--- Comment #4 from Oleg Pronin <syber at crazypanda dot ru> ---
(In reply to Laurent Rineau from comment #3)
> I got hit by this bug today. Do you know a workaround?

Yes. Use static thread_local inside static member function.

template <typename T>
struct A {
  virtual void foo() { v(); }

  static Heavy& v () {
      static thread_local Heavy obj;
      return obj;
  }
};

However i discovered that perfomance of getting thread_local variables somewhy
depends on the class (Heavy) in this case, it's size and constructor complexity
maybe i dunno, however i benchmarked the code above for some my class and got
just only 100M/s calls. After that i changed the code above to:

template <typename T>
struct A {
  virtual void foo() { v(); }

  static Heavy* v () {
      static thread_local Heavy* ptr;
      if (!ptr) {
          static thread_local Heavy inst;
          ptr = &inst;
      }
      return ptr;
  }
};

and got 300M/s calls. The point is that you should not allow code flow to go
through static thread_local object declaration itself, only through POD pointer
type. I don't know why thread_local is so ineffective however this workaround
gave x3 speed for me in GCC 4.9

In clang, there is no thread_local bug, however it's speed is always slow
(100M/s) and i could not speedup it with any of workarounds :(

Reply via email to