https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87595
Bug ID: 87595 Summary: __tls_get_addr should be __attribute__((__noplt__)) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- The relevant code seems to be in targets, but this enhancement request applies to all targets. __tls_get_addr is sufficiently a bottleneck that many projects (even gcc target libs) try to bypass it by using initial-exec model. In general, bypassing the PLT and calling directly through the GOT will save at least an icache line and 1 instruction. On some targets it takes several instruction to get through the PLT, and also imposes constraints on register allocation (e.g. ebx on i386). My initial testing shows -fno-plt makes GD TLS access about 8% faster on i386 and no worse on x86_64. I will try to post some reproducible benchmarks as a follow-up later.