Hi, On 2024-06-05 21:10:01 -0400, Robert Haas wrote: > On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > I'm very much in favor of a runtime toggle. To be precise, a > > PGC_POSTMASTER setting. We'll get a lot more testing if you can easily > > turn it on/off, and so far I haven't seen anything that would require it > > to be a compile time option. > > I was thinking about global variable annotations. If someone wants to > build without multithreading, I think that they won't want to still > end up with a ton of variables being changed to thread-local.
Depending on the architecture / ABI / compiler options it's often not meaningfully more expensive to access a thread local variable than a "normal" variable. I think these days it's e.g. more expensive on x86-64 windows, but not linux. On arm the overhead of TLS is more noticeable, across platforms, afaict. Example compiler output for x86-64 and armv8: https://godbolt.org/z/K369eG5MM Cycle analysis or linux x86-64 output: https://godbolt.org/z/KK57vM1of This shows that for the linux x86-64 case there's no difference in efficiency between the tls/non-tls case. The reason it's so fast on x86-64 linux is that they reused one of the "old" segment registers to serve as the index register differing between each thread. For x86-64 code, most code is compiled position independent, and *also* uses an indexed mode (but relative to the instruction pointer). I think we might be able to gain some small performance benefits via the annotations, which actualy might make it viable to just apply the annotations regardless of using threads or not. Greetings, Andres Freund