On 11/5/24 1:11 PM, Vineet Gupta wrote:
changes since v1
   * Changed target hook to --param
   * squash addon patch for RISC-V opting-in, testcase here
   * updated changelog with latest perf numbers
---

sched1 computes ECC (Excess Change Cost) for each insn, which represents
the register pressure attributed to the insn.
Currently the pressure sensitive schduling algorithm deliberately ignores
negative values (pressure reduction), making them 0 (neutral), leading
to more spills. This happens due to the assumption that the compiler has
a reasonably accurate processor pipeline scheduling model and thus tries
to aggresively fill pipeline bubbles with spill slots.

This however might not be true, as the model might not be available for
certains uarches or even applicable especially for modern out-of-order cores.

The existing heuristic induces spill frenzy on RISC-V, noticably so on
SPEC2017 507.Cactu. If insn scheduling is disabled completely, the
total dynamic icounts for this workload are reduced in half from
~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns).

This patch adds --param=cycle-accurate-model={0,1} to gate the spill
behavior.

  - The default (1) preserves existing spill behavior.

  - targets/uarches sensitive to spilling can override the param to (0)
    to get the reverse effect. RISC-V backend does so too.

The actual perf numbers are very promising.

(1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs:

   Before:
   ------
   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

       4,917,712.97 msec task-clock:u                     #    1.000 CPUs 
utilized
              5,314      context-switches:u               #    1.081 /sec
                  3      cpu-migrations:u                 #    0.001 /sec
            204,784      page-faults:u                    #   41.642 /sec
  7,868,291,222,513      cycles:u                         #    1.600 GHz
  2,615,069,866,153      instructions:u                   #    0.33  insn per 
cycle
     10,799,381,890      branches:u                       #    2.196 M/sec
         15,714,572      branch-misses:u                  #    0.15% of all 
branches

   After:
   -----
   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

       4,552,979.58 msec task-clock:u                     #    0.998 CPUs 
utilized
            205,020      context-switches:u               #   45.030 /sec
                  2      cpu-migrations:u                 #    0.000 /sec
            204,221      page-faults:u                    #   44.854 /sec
  7,285,176,204,764      cycles:u        (7.4% faster)    #    1.600 GHz
  2,145,284,345,397      instructions:u (17.96% fewer)    #    0.29  insn per 
cycle
     10,799,382,011      branches:u                       #    2.372 M/sec
         16,235,628      branch-misses:u                  #    0.15% of all 
branches

(2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs.

gcc/ChangeLog:
        PR target/11472
        * params.opt (--param=cycle-accurate-model=): New opt.
        * doc/invoke.texi (cycle-accurate-model): Document.
        * haifa-sched.cc (model_excess_group_cost): Return negative
        delta if param_cycle_accurate_model is 0.
        (model_excess_cost): Ceil negative baseECC to 0 only if
        param_cycle_accurate_model is 1.
        Dump the actual ECC value.
        * config/riscv/riscv.cc (riscv_option_override): Set param
        to 0.

gcc/testsuite/ChangeLog:
        PR target/114729
        * gcc.target/riscv/riscv.exp: Enable new tests to build.
        * gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.

Signed-off-by: Vineet Gupta <vine...@rivosinc.com>
---
  gcc/config/riscv/riscv.cc                     |  4 +++
  gcc/doc/invoke.texi                           |  7 ++++
  gcc/haifa-sched.cc                            | 32 ++++++++++++++-----
  gcc/params.opt                                |  4 +++
  gcc/testsuite/gcc.target/riscv/riscv.exp      |  2 ++
  .../gcc.target/riscv/sched1-spills/spill1.cpp | 32 +++++++++++++++++++
  6 files changed, 73 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7146163d66d0..c1e07e258b25 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17084,6 +17084,13 @@ With @option{--param=openacc-privatization=quiet}, 
don't diagnose.
  This is the current default.
  With @option{--param=openacc-privatization=noisy}, do diagnose.
+@item cycle-accurate-model
+Specifies whether GCC should assume that the scheduling description is mostly
+a cycle-accurate model of the target processor, where the code is intended to
+run on, in the absence of cache misses.  Nonzero means that the selected 
scheduling
+model is accuate and likely describes an in-order processor, and that 
scheduling
+will aggressively spill to try and fill any pipeline bubbles.
s/accuate/accurate/ And you should probably say something about what 0 means in this context as well.

OK with those changes.

Thanks,
jeff

Reply via email to