https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87595

            Bug ID: 87595
           Summary: __tls_get_addr should be __attribute__((__noplt__))
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

The relevant code seems to be in targets, but this enhancement request applies
to all targets.

__tls_get_addr is sufficiently a bottleneck that many projects (even gcc target
libs) try to bypass it by using initial-exec model. In general, bypassing the
PLT and calling directly through the GOT will save at least an icache line and
1 instruction. On some targets it takes several instruction to get through the
PLT, and also imposes constraints on register allocation (e.g. ebx on i386).

My initial testing shows -fno-plt makes GD TLS access about 8% faster on i386
and no worse on x86_64. I will try to post some reproducible benchmarks as a
follow-up later.

Reply via email to