The use case proposed by Sri allows user to selectively eliminate PLT
overhead for hot external calls only. In such scenarios, lazy binding
won't be something matters to the user.

David

On Mon, May 4, 2015 at 7:45 AM, Michael Matz <m...@suse.de> wrote:
> Hi,
>
> On Thu, 30 Apr 2015, Sriraman Tallam wrote:
>
>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated
>> PLT stubs for some of the hot external library functions like memcmp,
>> pow.  The win was from better icache and itlb performance. The main
>> reason was that the PLT stubs had no spatial locality with the
>> call-sites. I have started looking at ways to tell the compiler to
>> eliminate PLT stubs (in-effect inline them) for specified external
>> functions, for x86_64. I have a proposal and a patch and I would like to
>> hear what you think.
>>
>> This comes with caveats.  This cannot be generally done for all
>> functions marked extern as it is impossible for the compiler to say if a
>> function is "truly extern" (defined in a shared library). If a function
>> is not truly extern(ends up defined in the final executable), then
>> calling it indirectly is a performance penalty as it could have been a
>> direct call.
>
> This can be fixed by Alans idea.
>
>> Further, the newly created GOT entries are fixed up at
>> start-up and do not get lazily bound.
>
> And this can be fixed by some enhancements in the linker and dynamic
> linker.  The idea is to still generate a PLT stub and make its GOT entry
> point to it initially (like a normal got.plt slot).  Then the first
> indirect call will use the address of PLT entry (starting lazy resolution)
> and update the GOT slot with the real address, so further indirect calls
> will directly go to the function.
>
> This requires a new asm marker (and hence new reloc) as normally if
> there's a GOT slot it's filled by the real symbols address, unlike if
> there's only a got.plt slot.  E.g. a
>
>   call *foo@GOTPLT(%rip)
>
> would generate a GOT slot (and fill its address into above call insn), but
> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.
>
>
> Ciao,
> Michael.

Reply via email to