https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686
Bug ID: 113686 Summary: [RISC-V] TLS (Local Exec) relaxation on structures (LE) Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- When the Local Exec TLS model is in use, gcc generates inefficient code for accessing the member of a structure: struct foobar { int alpha; int beta; }; _Thread_local struct foobar foo; void func(int bar) { foo.beta = bar; } # Version 1 lui a1,%tprel_hi(foo) add a1,a1,tp,%tprel_add(foo) addi a1,a1,%tprel_lo(foo) sw a0,4(a1) However, in this case it could be generated as: # Version 2 lui a1,%tprel_hi(sym+4) addi a1,a1,tp,%tprel_add(sym+4) sw a0,%tprel_lo(sym+4)(a1) ... which, if %tprel_hi(sym+4) == 0, as it often is for small embedded software, the linker can relax to a simple (tp) reference: # Version 2a (post-relaxation with small .tbss) sw a0,%tprel_lo(sym+4)(tp) The linker will *not* relax version 1 all the way; leaving an unnecessary mv: # Version 1a (post-relaxation with small .tbss) mv a1,tp sw a0,%tprel_lo(sym+4)(tp) It is of course trickier for the case of multiple subsequent references to the structure if the structure is not aligned, as gcc can't know a priori where the 4K breaks are[*]. The version 1 code is more efficient in that case (3 instructions + 1 instruction/field as opposed to 3 instructions/field.) However, if the structure *is* aligned, gcc will still not optimize 1 into 2. There are at least a few options I see: 1. gcc option: gcc can generate version 2 code for a single field reference, or if the alignment is such that all fields are guaranteed to fall inside the same 4K window. 2. gcc and optional ABI option: introduce a "TLS TE-tiny" model for deep embedded use, where the combined size of the TSS area is limited to 4K equivalent to the way direct gp references [or zero, if the global pointer is 0] work. Thus, direct (tp) references can be used. NOTE: With the current binutils, this will error unless .option norelax is in effect. It might be desirable to instead have a new relocation type, which would require binutils support. Alternatively, ld should recognize that the TLS offset is within +/- 2K and suppress the warning in that case (since at that point the address is available the the linker.) The linker could be further optimized by allowing the TLS to offset; presumably equivalently to the __global_pointer$ symbol. 3. binutils option: teach ld to relax these kinds of chained pointer references. [*] Rant: in my opinion, the lui/auipc instructions are fundamentally misdesigned by not having an overlap bit to guarantee a sizable window.