On Thu, Mar 28, 2024 at 2:23 AM Jan Beulich <jbeul...@suse.com> wrote: > > On 28.03.2024 08:43, Fangrui Song wrote: > > On Fri, Mar 22, 2024 at 6:51 PM Fangrui Song <mask...@gcc.gnu.org> wrote: > >> > >> On Thu, Mar 14, 2024 at 5:16 PM Fangrui Song <mask...@gcc.gnu.org> wrote: > >>> > >>> The relocation formats REL and RELA for ELF are inefficient. In a > >>> release build of Clang for x86-64, .rela.* sections consume a > >>> significant portion (approximately 20.9%) of the file size. > >>> > >>> I propose RELLEB, a new format offering significant file size > >>> reductions: 17.2% (x86-64), 16.5% (aarch64), and even 32.4% (riscv64)! > >>> > >>> Your thoughts on RELLEB are welcome! > >>> > >>> Detailed analysis: > >>> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf > >>> generic ABI (ELF specification): > >>> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw > >>> binutils feature request: > >>> https://sourceware.org/bugzilla/show_bug.cgi?id=31475 > >>> LLVM: > >>> https://discourse.llvm.org/t/rfc-relleb-a-compact-relocation-format-for-elf/77600 > >>> > >>> Implementation primarily involves binutils changes. Any volunteers? > >>> For GCC, a driver option like -mrelleb in my Clang prototype would be > >>> needed. The option instructs the assembler to use RELLEB. > >> > >> The format was tentatively named RELLEB. As I refine the original pure > >> LEB-based format, “RELLEB” might not be the most fitting name. > >> > >> I have switched to SHT_CREL/DT_CREL/.crel and updated > >> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf > >> and > >> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw/m/eiBcYxSfAQAJ > >> > >> The new format is simpler and better than RELLEB even in the absence > >> of the shifted offset technique. > >> > >> Dynamic relocations using CREL are even smaller than Android's packed > >> relocations. > >> > >> // encodeULEB128(uint64_t, raw_ostream &os); > >> // encodeSLEB128(int64_t, raw_ostream &os); > >> > >> Elf_Addr offsetMask = 8, offset = 0, addend = 0; > >> uint32_t symidx = 0, type = 0; > >> for (const Reloc &rel : relocs) > >> offsetMask |= crels[i].r_offset; > >> int shift = std::countr_zero(offsetMask) > >> encodeULEB128(relocs.size() * 4 + shift, os); > >> for (const Reloc &rel : relocs) { > >> Elf_Addr deltaOffset = (rel.r_offset - offset) >> shift; > >> uint8_t b = deltaOffset * 8 + (symidx != rel.r_symidx) + > >> (type != rel.r_type ? 2 : 0) + (addend != rel.r_addend ? 4 : > >> 0); > >> if (deltaOffset < 0x10) { > >> os << char(b); > >> } else { > >> os << char(b | 0x80); > >> encodeULEB128(deltaOffset >> 4, os); > >> } > >> if (b & 1) { > >> encodeSLEB128(static_cast<int32_t>(rel.r_symidx - symidx), os); > >> symidx = rel.r_symidx; > >> } > >> if (b & 2) { > >> encodeSLEB128(static_cast<int32_t>(rel.r_type - type), os); > >> type = rel.r_type; > >> } > >> if (b & 4) { > >> encodeSLEB128(std::make_signed_t<uint>(rel.r_addend - addend), os); > >> addend = rel.r_addend; > >> } > >> } > >> > >> --- > >> > >> While alternatives like PrefixVarInt (or a suffix-based variant) might > >> excel when encoding larger integers, LEB128 offers advantages when > >> most integers fit within one or two bytes, as it avoids the need for > >> shift operations in the common one-byte representation. > >> > >> While we could utilize zigzag encoding (i>>31) ^ (i<<1) to convert > >> SLEB128-encoded type/addend to use ULEB128 instead, the generate code > >> is inferior to or on par with SLEB128 for one-byte encodings. > > > > > > We can introduce a gas option --crel, then users can specify `gcc > > -Wa,--crel a.c` (-flto also gets -Wa, options). > > > > I propose that we add another gas option --implicit-addends-for-data > > (does the name look good?) to allow non-code sections to use implicit > > addends to save space > > (https://sourceware.org/PR31567). > > Using implicit addends primarily benefits debug sections such as > > .debug_str_offsets, .debug_names, .debug_addr, .debug_line, but also > > data sections such as .eh_frame, .data., .data.rel.ro, .init_array. > > > > -Wa,--implicit-addends-for-data can be used on its own (6.4% .o > > reduction in a clang -g -g0 -gpubnames build)
> And this option will the switch from RELA to REL relocation sections, > effectively in violation of most ABIs I'm aware of? This does violate x86-64 LP64 psABI and PPC64 ELFv2. The AArch64 psABI allows REL while the RISC-V psABI doesn't say anything about REL/RELA. x86-64: The AMD64 LP64 ABI architecture uses only Elf64_Rela relocation entries with explicit addends. The r_addend member serves as the relocation addend. The AMD64 ILP32 ABI architecture uses only Elf32_Rela relocation entries in relocatable files. Executable files or shared objects may use either Elf32_Rela or Elf32_Rel relocation entries. AArch64: A binary file may use ``REL`` or ``RELA`` relocations or a mixture of the two (but multiple relocations of the same place must use only one type). The initial addend for a ``REL``-type relocation is formed according to the following rules. - If the relocation relocates data (`Static Data relocations`_) the initial value in the place is sign-extended to 64 bits. - If the relocation relocates an instruction the immediate field of the instruction is extracted, scaled as required by the instruction field encoding, and sign-extended to 64 bits. A ``RELA`` format relocation must be used if the initial addend cannot be encoded in the place. There is no PC bias to accommodate in the relocation of a place containing an instruction that formulates a PC- relative address. The program counter reflects the address of the currently executing instruction. PPC64 ELFv2: The 64-bit OpenPOWER Architecture uses Elf64_Rela relocation entries exclusively. > Furthermore, why just data? x86 at least could benefit almost as much for > code. Hence maybe better --implicit-addends=data, with an option for > architectures to also permit --implicit-addends=text. I agree that the design is not great. I am thinking about an option that applies to all sections: During fixup conversion to relocations, check if the relocation type can accommodate the addend as a "data relocation type." If any relocation within a section encounters an oversized addend, switch from REL to RELA. However, the feasibility of this approach needs evaluation regarding implementation complexity. --- I have made `clang -g -gz=zstd` experiments, building lld for both `-O0` and `-O2`: ``` .o size | reloc size | .debug size |.debug_addr|.c?rela?.debug_addr 1453265896 | 467465160 | 200379733 | 77894 | 51123648 | -g -gz=zstd 1361904480 | 345821648 | 230681356 | 1628142 | 34082432 | -g -gz=zstd -Wa,--implicit-addends-for-data 1042317288 | 56517599 | 200378501 | 77894 | 5000201 | -g -gz=zstd -Wa,--crel 1057438728 | 41336040 | 230681552 | 1628142 | 3720546 | -g -gz=zstd -Wa,--crel,--implicit-addends-for-data 626745136 | 292634688 | 225932160 | 77920 | 47820480 | -O2 -g -gz=zstd 564322008 | 201200656 | 254962205 | 3104850 | 31880320 | -O2 -g -gz=zstd -Wa,--implicit-addends-for-data 363224200 | 29114818 | 225930949 | 77920 | 4513572 | -O2 -g -gz=zstd -Wa,--crel 377970016 | 14829524 | 254962382 | 3104850 | 2118037 | -O2 -g -gz=zstd -Wa,--crel,--implicit-addends-for-data ``` Observations: * With or without -gz=zstd (another experiment not shown here), the .o size reduction ratios with REL are close. * Implicit addends make .debug* sections less compressible. If the focus is .debug* and .rela.debug* sections, REL is a loss with -gz=zstd. * REL -gz=zstd is still smaller than RELA -gz=zstd, which is not surprising as we compare uncompressed REL/RELA (larger difference) and compressed non-zero/zero `.debug` contents (smaller difference). A few points about CREL: * For CREL -gz=zstd, using implicit addends increases .o file sizes likely because the "less compressible" factor is more significant when the relocation size becomes negligible. * CREL reduction ratio becomes incredible with -gz=zstd at a high optimization level: for -O2 -g -gz=zstd, it's a 42.0% reduction in the .o size! * CREL with implicit addends might not be worth doing if the priority is debug sections.