There's one problem with the couple of patches that I've seen go by wrt eliding
PLTs with -z now, and relaxing inlined PLTs (aka -fno-plt):

They're currently using the same relocations used by data, and thus the linker
and dynamic linker must ensure that pointer equality is maintained.  Which
results in branch-to-branch-(to-branch) situations.

E.g. the attached test case, in which main has a plt entry for function A in
a.so, and the function B in b.so calls A.

$ LD_BIND_NOW=1 gdb main
...
(gdb) b b
Breakpoint 1 at 0x400540
(gdb) run
Starting program: /home/rth/x/main
Breakpoint 1, b () at b.c:2
2       void b(void) { a(); }
(gdb) si
2       void b(void) { a(); }
=> 0x7ffff7bf75f4 <b+4>:        callq  0x7ffff7bf74e0
(gdb)
0x00007ffff7bf74e0 in ?? () from ./b.so
=> 0x7ffff7bf74e0:      jmpq   *0x20034a(%rip)        # 0x7ffff7df7830
(gdb)
0x0000000000400560 in a@plt ()
=> 0x400560 <a@plt>:    jmpq   *0x20057a(%rip)        # 0x600ae0
(gdb)
a () at a.c:2
2       void a() { printf("Hello, World!\n"); }
=> 0x7ffff7df95f0 <a>:  sub    $0x8,%rsp


If we use -fno-plt, we eliminate the first callq, but do still have two
consecutive jmpq's.

If seems to me that we ought to have different relocations when we're only
going to use a pointer for branching, and when we need a pointer to be
canonicalized for pointer comparisons.

In the linked image, we already have these: R_X86_64_GLOB_DAT vs
R_X86_64_JUMP_SLOT.  Namely, GLOB_DAT implies "data" (and therefore pointer
equality), while JUMP_SLOT implies "code" (and therefore we can resolve past
plt stubs in the main executable).

Which means that HJ's patch of May 16 (git hash 25070364), is less than ideal.
 I do like the smaller PLT entries, but I don't like the fact that it now emits
GLOB_DAT for the relocations instead of JUMP_SLOT.


In the relocatable image, when we're talking about -fno-plt, we should think
about what relocation we'd like to emit.  Yes, the existing R_X86_64_GOTPCREL
works with existing toolchains, and there's something to be said for that.
However, if we're talking about adding a new relocation for relaxing an
indirect call via GOTPCREL, then:

If we want -fno-plt to be able to hoist function addresses, then we're going to
want the address that we load for the call to also not be subject to possible
jump-to-jump.

Unless we want the linker to do an unreasonable amount of x86 code examination
in order to determine mov vs call for relaxation, we need two different
relocations (preferably using the same assembler mnemonic, and thus the correct
relocation is enforced by the assembler).

On the users/hjl/relax branch (and posted on list somewhere), the new
relocation is called R_X86_64_RELAX_GOTPCREL.  I'm not keen on that "relax"
name, despite that being exactly what it's for.

I suggest R_X86_64_GOTPLTPCREL_{CALL,LOAD} for the two relocation names.  That
is, the address is in the .got.plt section, it's a pc-relative relocation, and
it's being used by a call or load (mov) insn.

With those two, we can fairly easily relax call/jmp to direct branches, and mov
to lea.  Yes, LTO can perform the same optimization, but I'll also agree that
there are many projects for which LTO is both overkill and unworkable.

This does leave open other optimization questions, mostly around weak
functions.  Consider constructs like

        if (foo) foo();

Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting the
possibility (not certainty) of jump-to-jump but definitely avoiding a separate
load insn and the latency implied by that?


Comments?


r~

Attachment: test.tar
Description: Unix tar archive

Reply via email to