Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization

Michael Meissner Wed, 04 Sep 2019 10:27:01 -0700

On Tue, Sep 03, 2019 at 06:33:26PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 03, 2019 at 07:20:13PM -0400, Michael Meissner wrote:
> > On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> > > Hi!
> > > 
> > > On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > > > /* This file implements a RTL pass that looks for pc-relative loads of 
> > > > the
> > > >    address of an external variable using the PCREL_GOT relocation and a 
> > > > single
> > > >    load/store that uses that GOT pointer.
> > > 
> > > Does this work better than having a peephole for it?  Is there some reason
> > > you cannot do this with a peephole?
> > 
> > Yes.  Peepholes only look at adjacent insns.
> 
> Huh.  Wow.  Would you believe I never knew that (or I forgot)?  Well, that
> explains why peepholes aren't very effective for us at all, alright!
> 
> > This optimization allows the load
> > of the GOT address to be separated from the eventual load or store.
> > 
> > Peephole2's are likely too early, because you really, really, really don't 
> > want
> > any other pass moving things around.
> 
> That is a bit worrying...  What can go wrong?


As I say in the comments, with PCREL_OPT, you must have exactly one load of the
address and one load or store that references the load of the address.  If
something duplicates one of the loads or stores, or adds another reference to
the address, or just moves it so we can't link the loading of the address to
the final load/store, it will not work.

For stores, the value being stored must be live at both the loading of the
address and the store.

For loads, the register being loaded must not be used between the loading of
the address and the final load.

I.e. in:

                PLD r1,foo@got@pcrel
        .Lpcrel1:

                # other instructions

                .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
                LWZ r2,0(r1)

If you get lucky and foo is defined in the same compilation unit, this will get
turned into:

                PLWZ r2,foo@pcrel

                # other instructions

                NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

                PLD r1,.got.foo@pcrel

                # other instructions

                LWZ r2,0(r1)

                .section .got
        .got.foo: .quad foo

So for loads, r2 must not be used between the PLD and LWZ instructions.

Similarly for stores:

                PLD r1,foo@got@pcrel
        .Lpcrel1:

                # other instructions

                .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
                stw r2,0(r1)

If you get lucky, this becomes:

                PSTW r2,foo@pcrel

                # other instructions

                NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

                PLD r1,.got.foo@pcrel

                # other instructions

                STW r2,0(r1)

                .section .got
        .got.foo: .quad foo

So as I said, r2 must be live betweent he PLD and STW, because you don't know
if the PLD will be replaced with a PSTW or not.

So to keep other passes from 'improving' things, I opted to do the pass as the
last pass before final.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization

Reply via email to