Re: [Qemu-devel] [PATCH 2/7] store thread-specific env information

Jamie Lokier Sun, 29 Nov 2009 14:29:49 -0800

> On 11/29/2009 05:38 PM, Andreas Färber wrote:
>> Am 29.11.2009 um 16:29 schrieb Avi Kivity:
>>> Where is __thread not supported?
>> Apple, Sun.

Some flavours of uClinux :-)

Avi Kivity wrote:
> Well, pthread_getspecific is around 130 bytes of code, whereas __thread 
> is just on instruction.  Maybe we should support both.

It's easy enough, they are quite similar.  Except that
pthread_key_create lets you provide a destructor which is called as
each thread is destroyed (unfortunately no constructor for new
threads; and you can use both methods if you need a destructor and
speed together).

It's not always one instruction - it's more complicated in shared
libraries, but it's always close to that.

Anyway, I decided to measure them both as I wondered about this for
another program.

On my 2.0GHz Core Duo (32-bit), tight unrolled loop, everything in cache:

     Read void *__thread variable        ~ 0.6 ns
     Call pthread_getspecific(key)       ~ 8.8 ns

__thread is preferable but it's not much overhead to call pthread_getspecific().

Imho, it's not worth making code less portable or more complicated to
handle both, but it's a nice touch.

However, I did notice that the compiler optimises away references to
__thread variables much better, such as hoisting from inside loops.

In my programs I have taken to wrapping everything inside a
thread_specific(var) macro, similar to the one in the kernel, which
expands to call pthread_getspecific() or use __thread[*], That keeps the
complexity in one place, which is where the macro is defined.

( [*] - Windows has __thread, but it sometimes crashes when used in a
DLL, so I use the Windows equivalent of pthread_getspecific() in the
same wrapper macro, which is fine. )

-- Jamie

Re: [Qemu-devel] [PATCH 2/7] store thread-specific env information

Reply via email to