> On 11/29/2009 05:38 PM, Andreas Färber wrote: >> Am 29.11.2009 um 16:29 schrieb Avi Kivity: >>> Where is __thread not supported? >> Apple, Sun.
Some flavours of uClinux :-) Avi Kivity wrote: > Well, pthread_getspecific is around 130 bytes of code, whereas __thread > is just on instruction. Maybe we should support both. It's easy enough, they are quite similar. Except that pthread_key_create lets you provide a destructor which is called as each thread is destroyed (unfortunately no constructor for new threads; and you can use both methods if you need a destructor and speed together). It's not always one instruction - it's more complicated in shared libraries, but it's always close to that. Anyway, I decided to measure them both as I wondered about this for another program. On my 2.0GHz Core Duo (32-bit), tight unrolled loop, everything in cache: Read void *__thread variable ~ 0.6 ns Call pthread_getspecific(key) ~ 8.8 ns __thread is preferable but it's not much overhead to call pthread_getspecific(). Imho, it's not worth making code less portable or more complicated to handle both, but it's a nice touch. However, I did notice that the compiler optimises away references to __thread variables much better, such as hoisting from inside loops. In my programs I have taken to wrapping everything inside a thread_specific(var) macro, similar to the one in the kernel, which expands to call pthread_getspecific() or use __thread[*], That keeps the complexity in one place, which is where the macro is defined. ( [*] - Windows has __thread, but it sometimes crashes when used in a DLL, so I use the Windows equivalent of pthread_getspecific() in the same wrapper macro, which is fine. ) -- Jamie