Hi! On Tue, Jun 11, 2024 at 04:37:25PM +0800, Jiufu Guo wrote: > Sometimes, a complicated constant is built via 3(or more) > instructions. Generally speaking, it would not be as fast > as loading it from the constant pool (as the discussions in > PR63281): > "ld" is one instruction. If consider "address/toc" adjust, > we may count it as 2 instructions. And "pld" may need fewer > cycles.
Yeah, three insns to build up a constant always was about as fast/slow as loading a constant from memory. When you need two related constants loading from memory is slower, so we preferred building them up. > As testing(SPEC2017), it could get better/stable runtime > if set the threshold as "> 2" (compare with "> 3"). Interesting! About how much speedup did you see? 0.1%? > As known, because the constant is load from memory by this > patch, so this functionality may affect the cache missing. > Also there may be possible side effects on icach. While, > IMHO, this patch would be still do the right thing. Yeah, but almost every codegen patch has *some* icache effect. Your benchmarks show a net positive effect, that is all that matters in the end. > gcc/ChangeLog: > > * config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split > complicate constant to memory. (complicated) And don't write "update", every patch is an update :-) There are more conditions here, mention those as well? But, why does base_reg_operand matter? And, why for TARGET_ELF only? That is the only place you tested I'm sure, but make an educated guess for the rest, why would it not be useful for other binary formats? > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index e4dc629ddcc..f448df289a0 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -10240,6 +10240,21 @@ rs6000_emit_set_const (rtx dest, rtx source) > c = sext_hwi (c, 32); > emit_move_insn (lo, GEN_INT (c)); > } > + > + else if (base_reg_operand (dest, mode) && TARGET_64BIT > + && TARGET_ELF && num_insns_constant (source, mode) > 2) > + { > + rtx sym = force_const_mem (mode, source); > + if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0)) > + && use_toc_relative_ref (XEXP (sym, 0), mode)) > + { > + rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest)); > + sym = gen_const_mem (mode, toc); > + set_mem_alias_set (sym, get_TOC_alias_set ()); > + } > + > + emit_move_insn (dest, sym); > + } > else > rs6000_emit_set_long_const (dest, c); > break; Is there no utility function to put constants like this in memory? Like, "output_toc" for example? > diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c > b/gcc/testsuite/gcc.target/powerpc/const_anchors.c > index 542e2674b12..f33c9a83f5e 100644 > --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c > +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c > @@ -1,5 +1,5 @@ > /* { dg-do compile { target has_arch_ppc64 } } */ > -/* { dg-options "-O2" } */ > +/* { dg-options "-O2 -fdump-rtl-final" } */ > > #define C1 0x2351847027482577ULL > #define C2 0x2351847027482578ULL > @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long > long b) > *a++ = C2; > } > > -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */ > +/* Checking "final" instead checking "asm" output to avoid noise. */ > +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */ So, in the RTL dump it finds a named pattern adddi3, while in the asm you find random other addi insns? Hardly worth a comment in the testcase itself, but heh. The point is that you check for the RTL pattern name instead of the machine insn. The compiler pass you check in isn't changed even! > --- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > @@ -6,11 +6,18 @@ > /* { dg-final { scan-assembler-times {\mori\M} 4 } } */ > /* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ > > +/* The below macro helps to avoid loading constant from memory. */ > +#define CONST_AVOID_BASE_REG(DEST, CST) > \ > + { > \ > + register long long d asm ("r0") = CST; > \ > + asm volatile ("std %1, %0" : : "m"(DEST), "r"(d)); > \ > + } This needs an actual changelog, not just "update". Something as simple as "New macro to force constant into a reg" is fine, but more than "update" :-) > +/* The below macro helps to avoid loading constant from memory. */ > +#define CONST_AVOID_BASE_REG(DEST, CST) > \ > + { > \ > + register long long d asm ("r0") = CST; > \ > + asm volatile ("std %1, %0" : : "m"(DEST), "r"(d)); > \ > + } "Forces the constant into a register", yes. > --- a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c > @@ -4,17 +4,19 @@ > /* { dg-options "-O2 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */ > /* force the constant splitter run after RA: -fdisable-rtl-split1. */ > > +/* The below marco helps to avoid using paddi and avoid loading from memory. > */ (macro) > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c > @@ -0,0 +1,11 @@ > +/* Check loading constant from memory pool. */ > +/* { dg-options "-O2 -mpowerpc64" } */ > + > +void > +foo (unsigned long long *a) > +{ > + *a++ = 0x2351847027482577ULL; > +} > + > +/* { dg-final { scan-assembler-times {\mp?ld\M} 1 { target { lp64 } } } } */ Why target lp64 only? You have -mpowerpc64 already, does that not make us use ld here always? Segher