Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization

Michael Meissner Mon, 09 Sep 2019 15:39:50 -0700

On Mon, Sep 09, 2019 at 03:56:52PM -0500, Segher Boessenkool wrote:
> On Mon, Sep 09, 2019 at 04:32:39PM -0400, Michael Meissner wrote:
> > On Fri, Sep 06, 2019 at 07:09:45AM -0500, Segher Boessenkool wrote:
> > > On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:
> > > 
> > > [snip]
> > > 
> > > > So to keep other passes from 'improving' things, I opted to do the pass 
> > > > as the
> > > > last pass before final.
> > > 
> > > If the problem is that you do not properly analyse dependencies between
> > > insns, well, fix that?
> > > 
> > > If this really needs to be done after everything else GCC does, that is
> > > problematic.  What when you have two or more passes with that property?
> > > 
> > > If this really needs to be done after everything else GCC does, does it
> > > belong in the compiler at all?  Should the assembler do it instead, or
> > > the linker?
> > 
> > No, with the definition of the PCREL_OPT there can be only one reference.
> 
> I don't see why you think that argues for having to do it last?
> 
> > Yeah, there might be other ways to do it, but fundamentally you need to do 
> > this
> > as late as possible and prevent any other optimizations from messing things 
> > up.
> 
> That is true for *everything*.
> 
> 
> You haven't addressed the "if it should be after everything the compiler
> does, does this belong in the compiler at all" question.


I believe it falls out of the basic PCREL_OPT description which I have in the
comments to the code.

For the load case, if you have:

                pld 4,esym@got@pcrel
                addi 6,6,1
                lwz 5,0(4)

I.e. load up the addresss of 'esym' into register 4.  If 'esym' is defined in
another module and both are in the main program, the linker converts the PLD
into:

                pla 4,esym@pcrel

If instead esym is defined in a shared library or you are linking a shared
library, the linker rewrites this as:

                pld 4,.esym.got
                .section .got
        .esym.got:
                .quad esym
                .section .text

I.e. load up the address of 'esym' from an address in the data section that has
an external relocation to 'esym' and the runtime loader will fill in the
address after loading any shared libraries.

And you want to use the PCREL_OPT optimization, the following must be true:

    1) Between the PLD and LWZ, register 4 must not be referenced;
    2) Register 4 dies on the LWZ instruction;
    3) Register 5 is not used between PLD and LWZ.

If these hold, you can modify it to use the PCREL_OPT optimization:

                pld 4,esym@got@pcrel
        .Lpcrel1:
                addi 6,6,1
                .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
                lwz 5,0(4)

Then if 'esym' is in the main program, and you are linking for the main
program, the linker can change this to:

                plwz 4,esym@pcrel
                addi 6,6,1
                nop

Thus if any other pass, duplicates the LWZ, uses the result of the PLD, or uses
register 5 in that sequence, it will be invalid.  Hence, why I think it should
be the last pass before final.

Similarly for the store case.  If you have:

                pld 4,esym@got@pcrel
                addi 6,6,1
                stw 5,0(4)

And you want to use the PCREL_OPT optimization, the following must be true:

    1) Between the PLD and STW, register 4 must not be referenced;
    2) Register 4 dies on the LWZ instruction;
    3) Register 5 must have the value in it at the time of the PLD, and it must
       not be modified between the PLD and STW.

The compiler would generate:

                pld 4,esym@got@pcrel
        .Lpcrel2:
                addi 6,6,1
                .reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
                stw 5,0(4)

And if the symbol is defined in the main program, and you are linking for the
main program, the linker will transform this to:

                pstw 5,esym@pcrel
                addi 6,6,1
                nop

The reason the .Lpcrel<x> label is defined after the PLD and we use
.Lpcrel<x>-8 is due to the prefixed instruction possibly having a NOP if it
otherwise would cross a 64-byte boundary, and you would have the relocation on
the wrong word.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization

Reply via email to