There's one problem with the couple of patches that I've seen go by wrt eliding PLTs with -z now, and relaxing inlined PLTs (aka -fno-plt):
They're currently using the same relocations used by data, and thus the linker and dynamic linker must ensure that pointer equality is maintained. Which results in branch-to-branch-(to-branch) situations. E.g. the attached test case, in which main has a plt entry for function A in a.so, and the function B in b.so calls A. $ LD_BIND_NOW=1 gdb main ... (gdb) b b Breakpoint 1 at 0x400540 (gdb) run Starting program: /home/rth/x/main Breakpoint 1, b () at b.c:2 2 void b(void) { a(); } (gdb) si 2 void b(void) { a(); } => 0x7ffff7bf75f4 <b+4>: callq 0x7ffff7bf74e0 (gdb) 0x00007ffff7bf74e0 in ?? () from ./b.so => 0x7ffff7bf74e0: jmpq *0x20034a(%rip) # 0x7ffff7df7830 (gdb) 0x0000000000400560 in a@plt () => 0x400560 <a@plt>: jmpq *0x20057a(%rip) # 0x600ae0 (gdb) a () at a.c:2 2 void a() { printf("Hello, World!\n"); } => 0x7ffff7df95f0 <a>: sub $0x8,%rsp If we use -fno-plt, we eliminate the first callq, but do still have two consecutive jmpq's. If seems to me that we ought to have different relocations when we're only going to use a pointer for branching, and when we need a pointer to be canonicalized for pointer comparisons. In the linked image, we already have these: R_X86_64_GLOB_DAT vs R_X86_64_JUMP_SLOT. Namely, GLOB_DAT implies "data" (and therefore pointer equality), while JUMP_SLOT implies "code" (and therefore we can resolve past plt stubs in the main executable). Which means that HJ's patch of May 16 (git hash 25070364), is less than ideal. I do like the smaller PLT entries, but I don't like the fact that it now emits GLOB_DAT for the relocations instead of JUMP_SLOT. In the relocatable image, when we're talking about -fno-plt, we should think about what relocation we'd like to emit. Yes, the existing R_X86_64_GOTPCREL works with existing toolchains, and there's something to be said for that. However, if we're talking about adding a new relocation for relaxing an indirect call via GOTPCREL, then: If we want -fno-plt to be able to hoist function addresses, then we're going to want the address that we load for the call to also not be subject to possible jump-to-jump. Unless we want the linker to do an unreasonable amount of x86 code examination in order to determine mov vs call for relaxation, we need two different relocations (preferably using the same assembler mnemonic, and thus the correct relocation is enforced by the assembler). On the users/hjl/relax branch (and posted on list somewhere), the new relocation is called R_X86_64_RELAX_GOTPCREL. I'm not keen on that "relax" name, despite that being exactly what it's for. I suggest R_X86_64_GOTPLTPCREL_{CALL,LOAD} for the two relocation names. That is, the address is in the .got.plt section, it's a pc-relative relocation, and it's being used by a call or load (mov) insn. With those two, we can fairly easily relax call/jmp to direct branches, and mov to lea. Yes, LTO can perform the same optimization, but I'll also agree that there are many projects for which LTO is both overkill and unworkable. This does leave open other optimization questions, mostly around weak functions. Consider constructs like if (foo) foo(); Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting the possibility (not certainty) of jump-to-jump but definitely avoiding a separate load insn and the latency implied by that? Comments? r~
test.tar
Description: Unix tar archive