Hi!

On Thu, Jun 16, 2022 at 03:47:49PM +0800, Jiufu Guo wrote:
> Segher Boessenkool <seg...@kernel.crashing.org> writes:
> >> >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> >> >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> >> >> @@ -1,7 +1,7 @@
> >> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
> >> >>  /* { dg-require-effective-target lp64 } */
> >> >>  /* { dg-options "-O" } */
> >> >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
> >> >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */
> >> >
> >> > Why?  This is still better generated in code, no?  It should never be
> >> > loaded from a constant pool (it is hex 4000_0000_0000_0000, easy to
> >> > construct with just one or two insns).
> >> 
> >> For p8/9, two insns "lis 0x4000+sldi 32" are used:
> >>         addis %r3,%r2,.LANCHOR0@toc@ha
> >>         addi %r3,%r3,.LANCHOR0@toc@l
> >>         lis %r9,0x4000
> >>         sldi %r9,%r9,32
> >>         add %r3,%r3,%r9
> >>    blr
> >
> > That does not mean putting this constant in the constant pool is a good
> > idea at all, of course.
> >
> >> On p10, as expected, 'pld' would be better than 'lis+sldi'.
> >
> > Is it?
> 
> With simple cases, it shows 'pld' seems better. For perlbench, it may
> also indicate this. But I did not test this part separately.
> As you suggested, I will collect more data to check this change.

Look at p10 for example.  There can be only two pld's concurrently, and
they might miss in the cache as well (not likely hopefully, but it is
costly).  pld is between 4 and 6 cycles latency, so that is never better
than 1+1 to 3+3 what the addi+rldicr (li+sldi) are, and easily worse.

If you really see loads being better than two simple integer insns, we
need to rethink more :-/


Segher

Reply via email to