On Mon, Dec 02, 2019 at 06:07:23PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Nov 15, 2019 at 07:17:34PM -0500, Michael Meissner wrote:
> > This series of patches adds the PCREL_OPT optimization for the PC-relative
> > support in the PowerPC compiler.
> > 
> > This optimization convert a single load or store of an external variable to 
> > use
> > the R_PPC64_PCREL_OPT relocation.
> > 
> > For example, a normal load of an external int variable (with -mcpu=future)
> > would generate:
> > 
> >             PLD 9,ext_symbol@got@pcrel(0),1
> >             LWA 10,0(9)
> > 
> > That is, load the address of 'ext_symbol' into register 9.  If 'ext_symbol' 
> > is

If you want to show a load of an int ext_symbol, the example should be
 pld 9,ext_symbol@got@pcrel
 lwz 10,0(9)

Add "(0),1" to the end of the pld line to show the optional operands.

> > defined in another module in the main program, and the current module is 
> > also
> > in the main program, the linker will optimize this to:
> 
> What does "module" mean?  Translation unit?  Object file?  And "main
> program" is what ELF calls "executable", right?

"relocatable object file" and "executable or shared library"
respectively.  If the linker is creating a shared library, and
ext_symbol is local to that library by virtue of non-default symbol
visibility or symbol versioning, then the optimisation will be done
for shared libraries too.

> You don't need to say that here, it is not something the compiler can do
> anything about.  You could just say "if possible, the linker will..." etc.
> 
> >             PADDI 9,ext_symbol(0),1
> >             LWZ 10,0(9)
> 
> I don't think it will change an lwa insn to an lwz?  Probably it should
> be lwz throughout?

Yes, see above.  Changing got indirect to pc-relative (or toc-relative
for that matter) is something the linker does even in the absence of
PCREL_OPT relocs.  That's what Mike was trying to show with the above
transformation, modulo typos.

> Is that "paddi" syntax correct?  I think you might mean
> "paddi 9,0,ext_symbol,1", aka "pla 9,ext_symbol"?

No, it's not correct but your corrections aren't correct either.  :)

 pla 9,ext_symbol@pcrel  # add (0),1 for optional operands
or
 paddi 9,0,ext_symbol@pcrel,1

You'll get the wrong reloc without @pcrel.

> > If either the definition is not in the main program or we are linking for a
> > shared library, the linker will create an address in the .got section and 
> > do a
> > PLD of it:
> > 
> >             .section .got
> >     .got.ext_symbol:
> >             .quad ext_symbol
> > 
> >             .section .text
> >             PLD 9,.got.ext_symbol(0),1
> >             LWZ 10,0(9)
> 
> Like what the user wrote, sure -- the linker does not optimise it, does
> not change it?  Or am I missing something?
> 
> > If the only use of the GOT address is a single load and store, we can 
> > optimize
> > this further:
> 
> A single load *or* store.
> 
> >             PLD 9,ext_symbol@got@pcrel(0),1
> >     .Lpcrel1:
> >             .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
> >             LWZ 10,0(9)
> > 
> > In this case, if the variable is defined in another module for the main
> > program, and we are linking for the main program, the linker will transform
> > this to:
> > 
> >             PLWZ 10,ext_symbol@pcrel(0),1
> >             NOP
> > 
> > There can be arbitrary instructions between the PLD and the LWA (or STW).
> 
> ... because that is what that relocation means.  The compiler still has
> to make sure that any such insns should not prevent this transform.

Right, and that's the hard part of this transformation.

> > For either loads or store, register 9 must only be used in the load or 
> > store,
> > and must die at that point.
> > 
> > For loads, there must be no reference to register 10 between the PLD and the
> > LWZ.  For a store, register 10 must be live at the PLD instruction, and must
> > not be modified between the PLD and the STW instructions.
> 
> "No reference"...  Nothing indirect either (like from a function call,
> or simply some insn that does not name the register directly).  Or code
> like
> 
>       pld 9,ext_symbol@got@pcrel(0),1 ; .Lpcrel1:
>       .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
>       b 2f
> 
> here: # some code that does not explicitly reference r10 here,
>       # but r10 is live here nevertheless, and is used later
>       b somewhere_else
> 
> 2:    lwz 10,0(9)
> 
> complicates your analysis, too.  So something DF is needed here, or
> there are lots and lots and lots of cases to look out for.
> 
> 
> Segher

-- 
Alan Modra
Australia Development Lab, IBM

Reply via email to