https://sourceware.org/bugzilla/show_bug.cgi?id=32387
Bug ID: 32387 Summary: ppc64 TLS local-dynamic optimization bug when built with -fno-plt Product: binutils Version: unspecified Status: NEW Severity: normal Priority: P2 Component: ld Assignee: unassigned at sourceware dot org Reporter: bergner at linux dot ibm.com Target Milestone: --- We SEGV on the following simple test case when our library function is compiled with -fno-plt. bergner@c643n10lp1:bug$ cat libtlsdso.c unsigned long * dsofunc_ld (void) { static _Thread_local __attribute ((tls_model("local-dynamic"))) unsigned long tlsvar = 0xdeadbeef; return &tlsvar; } bergner@c643n10lp1:bug$ cat prog.c #include <stdio.h> extern unsigned long *dsofunc_ld (void); int main (void) { unsigned long *var = dsofunc_ld (); printf ("0x%08x\n", *var); return 0; } gcc -O2 -fPIC -fplt -c -o libtlsdso.o libtlsdso.c gcc -O2 -fPIC -shared -Wl,-soname=libtlsdso.so -o libtlsdso.so libtlsdso.o gcc -O2 -R./ -L./ prog.c -o prog -ltlsdso bergner@c643n10lp1:bug$ ./prog 0xdeadbeef gcc -O2 -fPIC -fno-plt -c -o libtlsdso.o libtlsdso.c gcc -O2 -fPIC -shared -Wl,-soname=libtlsdso.so -o libtlsdso.so libtlsdso.o gcc -O2 -R./ -L./ prog.c -o prog -ltlsdso bergner@c643n10lp1:bug$ ./prog Segmentation fault (core dumped) In the working case (-fplt), the call to __tls_get_addr calls through the PLT call stub with tld module id = 0 and offset = 4096 (ie, this is a local-dynamic optimization). The PLT call stub for __tls_get_addr is "special" in that is has special code to look for module id == 0 and it just returns the offset in that case. In the failing case, the inline PLT call skips the PLT stub and goes directly to __tls_get_addr which assumes the module id == 0 case has been handled, so things go badly. I see a couple of solutions, 1) The inline plt call stub could be expanded to handle the module id == 0 case itself or 2) Change GCC so that calls to __tls_get_addr don't use inline PLT call stubs and instead use the normal PLT mechanism or 3) The linker could emit the special __tls_get_addr PLT stub and force the inline PLT call to branch to the special PLT stub rather than __tls_get_addr itself. Did I miss any other options? Option 1 would give the highest performing code, but 1 and 2 seem to go against Alan's original design decision of not needing any GCC changes for the local-dynamic optimization. -- You are receiving this mail because: You are on the CC list for the bug.