Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

Richard Biener Tue, 04 Mar 2025 02:31:54 -0800

On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > Jan Hubicka <hubi...@ucw.cz> writes:
> >>>
> >>> Thanks for running these.  I saw poor results for perlbench with my
> >>> initial aarch64 hooks because the hooks reduced the cost to zero for
> >>> the entry case:
> >>>
> >>>         auto entry_cost = targetm.callee_save_cost
> >>>           (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> >>>            ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> >>>            allocated_callee_save_regs, existing_spills_p);
> >>>         /* In the event of a tie between caller-save and callee-save,
> >>>            prefer callee-save.  We apply this to the entry cost rather
> >>>            than the exit cost since the entry frequency must be at
> >>>            least as high as the exit frequency.  */
> >>>         if (entry_cost > 0)
> >>>           entry_cost -= 1;
> >>>
> >>> I "fixed" that by bumping the cost to a minimum of 2, but I was
> >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 1",
> >>> so that the cost is always greater than not using a callee save for
> >>> registers that don't cross a call.  WDYT?
> >>
> >> For x86 perfomance costs, the push cost should be memory_move_cost which
> >> is 6, -2 for adjustment in the target hook and -1 for this. So cost
> >> should not be 0 I think.
> >>
> >> For size cost, I currently return 1, so we indeed get 0 after
> >> adjustment.
> >>
> >> I think cost of 0 will make us to pick callee save even if caller save
> >> is available and there are no function calls, so I guess we do not want
> >> that....
> >
> > OK, here's an updated patch that makes that change.  The x86 parts
> > should be replaced by your patch.
> >
> > Tested on aarch64-linux-gnu.  I also tried to test on pwoerpc64el-linux-gnu
> > (on gcc112), but I keep getting broken pipes during the test runs,
> > so I'm struggling to get good before/after comparisons.  It does at
> > least bootstrap though...
>
> Here's the patch with Honza's x86 changes.  Boostrapped & regresiion-tested
> on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120).  The powerpc64le
> results regressed:
>
> FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 i == 5
> FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 sizeof 
> (a) == 17 * sizeof (short)
>
> but the same test already failed for -O2 and -O3.
>
> OK to install now?  Or, given the lateness in the release cycle,
> would it be better to wait for GCC 16?


I think it's OK to install now.  Not installing anything isn't an option, the
alternative would be to at least revert HJs change.

Thanks,
Richard.

>
> Thanks,
> Richard
>
>
> Following on from the discussion in:
>
>   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html
>
> this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
> replaces it with two hooks: one that controls the cost of using an
> extra callee-saved register and one that controls the cost of allocating
> a frame for the first spill.
>
> (The patch does not attempt to address the shrink-wrapping part of
> the thread above.)
>
> On AArch64, this is enough to fix PR117477, as verified by the new tests.
> The patch does not change the SPEC2017 scores significantly.  (I saw a
> slight improvement in fotonik3d and roms, but I'm not convinced that
> the improvements are real.)
>
> The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
> which is a scan-dump correctness test that relies on not using
> caller saves.  The decision to use caller saves looks appropriate,
> and saves an instruction, so I've just added -fno-caller-saves
> to the test options.
>
> The x86 parts were written by Honza.
>
> gcc/
>         PR rtl-optimization/117477
>         * config/aarch64/aarch64.cc (aarch64_count_saves): New function.
>         (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
>         (aarch64_frame_allocation_cost): Likewise.
>         (TARGET_CALLEE_SAVE_COST): Define.
>         (TARGET_FRAME_ALLOCATION_COST): Likewise.
>         * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
>         Replace with...
>         (ix86_callee_save_cost): ...this new hook.
>         (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
>         (TARGET_CALLEE_SAVE_COST): Define.
>         * target.h (spill_cost_type, frame_cost_type): New enums.
>         * target.def (callee_save_cost, frame_allocation_cost): New hooks.
>         (ira_callee_saved_register_cost_scale): Delete.
>         * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): 
> Delete.
>         (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
>         * doc/tm.texi: Regenerate.
>         * hard-reg-set.h (hard_reg_set_popcount): New function.
>         * ira-color.cc (allocated_memory_p): New variable.
>         (allocated_callee_save_regs): Likewise.
>         (record_allocation): New function.
>         (assign_hard_reg): Use targetm.frame_allocation_cost to model
>         the cost of the first spill or first caller save.  Use
>         targetm.callee_save_cost to model the cost of using new callee-saved
>         registers.  Apply the exit rather than entry frequency to the cost
>         of restoring a register or deallocating the frame.  Update the
>         new variables above.
>         (improve_allocation): Use record_allocation.
>         (color): Initialize allocated_callee_save_regs.
>         (ira_color): Initialize allocated_memory_p.
>         * targhooks.h (default_callee_save_cost): Declare.
>         (default_frame_allocation_cost): Likewise.
>         * targhooks.cc (default_callee_save_cost): New function.
>         (default_frame_allocation_cost): Likewise.
>
> gcc/testsuite/
>         PR rtl-optimization/117477
>         * gcc.target/aarch64/callee_save_1.c: New test.
>         * gcc.target/aarch64/callee_save_2.c: Likewise.
>         * gcc.target/aarch64/callee_save_3.c: Likewise.
>         * gcc.target/aarch64/pr103350-1.c: Add -fno-caller-saves.
>
> Co-authored-by: Jan Hubicka <hubi...@ucw.cz>
> ---
>  gcc/config/aarch64/aarch64.cc                 | 118 ++++++++++++++++++
>  gcc/config/i386/i386.cc                       |  28 +++--
>  gcc/doc/tm.texi                               |  77 ++++++++++--
>  gcc/doc/tm.texi.in                            |   6 +-
>  gcc/hard-reg-set.h                            |  15 +++
>  gcc/ira-color.cc                              |  83 ++++++++++--
>  gcc/target.def                                |  87 +++++++++++--
>  gcc/target.h                                  |  12 ++
>  gcc/targhooks.cc                              |  27 ++++
>  gcc/targhooks.h                               |   5 +
>  .../gcc.target/aarch64/callee_save_1.c        |  12 ++
>  .../gcc.target/aarch64/callee_save_2.c        |  14 +++
>  .../gcc.target/aarch64/callee_save_3.c        |  12 ++
>  gcc/testsuite/gcc.target/aarch64/pr103350-1.c |   2 +-
>  14 files changed, 459 insertions(+), 39 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_3.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index fe76730b0a7..27ea82cd7da 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -15873,6 +15873,118 @@ aarch64_memory_move_cost (machine_mode mode, 
> reg_class_t rclass_i, bool in)
>           : base + aarch64_tune_params.memmov_cost.store_int);
>  }
>
> +/* CALLEE_SAVED_REGS is the set of callee-saved registers that the
> +   RA has already decided to use.  Return the total number of registers
> +   in class RCLASS that need to be saved and restored, including the
> +   frame link registers.  */
> +static int
> +aarch64_count_saves (const HARD_REG_SET &callee_saved_regs, reg_class rclass)
> +{
> +  auto saved_gprs = callee_saved_regs & reg_class_contents[rclass];
> +  auto nregs = hard_reg_set_popcount (saved_gprs);
> +
> +  if (TEST_HARD_REG_BIT (reg_class_contents[rclass], LR_REGNUM))
> +    {
> +      if (aarch64_needs_frame_chain ())
> +       nregs += 2;
> +      else if (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM))
> +       nregs += 1;
> +    }
> +  return nregs;
> +}
> +
> +/* CALLEE_SAVED_REGS is the set of callee-saved registers that the
> +   RA has already decided to use.  Return the total number of registers
> +   that need to be saved above the hard frame pointer, including the
> +   frame link registers.  */
> +static int
> +aarch64_count_above_hard_fp_saves (const HARD_REG_SET &callee_saved_regs)
> +{
> +  /* FP and Advanced SIMD registers are saved above the frame pointer
> +     but SVE registers are saved below it.  */
> +  if (known_le (GET_MODE_SIZE (aarch64_reg_save_mode (V8_REGNUM)), 16U))
> +    return aarch64_count_saves (callee_saved_regs, POINTER_AND_FP_REGS);
> +  return aarch64_count_saves (callee_saved_regs, POINTER_REGS);
> +}
> +
> +/* Implement TARGET_CALLEE_SAVE_COST.  */
> +static int
> +aarch64_callee_save_cost (spill_cost_type spill_type, unsigned int regno,
> +                         machine_mode mode, unsigned int nregs, int mem_cost,
> +                         const HARD_REG_SET &callee_saved_regs,
> +                         bool existing_spill_p)
> +{
> +  /* If we've already committed to saving an odd number of GPRs, assume that
> +     saving one more will involve turning an STR into an STP and an LDR
> +     into an LDP.  This should still be more expensive than not spilling
> +     (meaning that the minimum cost is 1), but it should usually be cheaper
> +     than a separate store or load.  */
> +  if (GP_REGNUM_P (regno)
> +      && nregs == 1
> +      && (aarch64_count_saves (callee_saved_regs, GENERAL_REGS) & 1))
> +    return 1;
> +
> +  /* Similarly for saving FP registers, if we only need to save the low
> +     64 bits.  (We can also use STP/LDP instead of STR/LDR for Q registers,
> +     but that is less likely to be a saving.)  */
> +  if (FP_REGNUM_P (regno)
> +      && nregs == 1
> +      && known_eq (GET_MODE_SIZE (aarch64_reg_save_mode (regno)), 8U)
> +      && (aarch64_count_saves (callee_saved_regs, FP_REGS) & 1))
> +    return 1;
> +
> +  /* If this would be the first register that we save, add the cost of
> +     allocating or deallocating the frame.  For GPR, FPR, and Advanced SIMD
> +     saves, the allocation and deallocation can be folded into the save and
> +     restore.  */
> +  if (!existing_spill_p
> +      && !GP_REGNUM_P (regno)
> +      && !(FP_REGNUM_P (regno)
> +          && known_le (GET_MODE_SIZE (aarch64_reg_save_mode (regno)), 16U)))
> +    return default_callee_save_cost (spill_type, regno, mode, nregs, 
> mem_cost,
> +                                    callee_saved_regs, existing_spill_p);
> +
> +  return mem_cost;
> +}
> +
> +/* Implement TARGET_FRAME_ALLOCATION_COST.  */
> +static int
> +aarch64_frame_allocation_cost (frame_cost_type,
> +                              const HARD_REG_SET &callee_saved_regs)
> +{
> +  /* The intention is to model the relative costs of different approaches
> +     to storing data on the stack, rather than to model the cost of saving
> +     data vs not saving it.  This means that we should return 0 if:
> +
> +     - any frame is going to be allocated with:
> +
> +          stp x29, x30, [sp, #-...]!
> +
> +       to create a frame link.
> +
> +     - any frame is going to be allocated with:
> +
> +          str x30, [sp, #-...]!
> +
> +       to save the link register.
> +
> +     In both cases, the allocation and deallocation instructions are the
> +     same however we store data to the stack.  (In the second case, the STR
> +     could be converted to an STP by saving an extra call-preserved register,
> +     but that is modeled by aarch64_callee_save_cost.)
> +
> +     In other cases, assume that a frame would need to be allocated with a
> +     separate subtraction and deallocated with a separate addition.  Saves
> +     of call-clobbered registers can then reclaim this cost using a
> +     predecrement store and a postincrement load.
> +
> +     For simplicity, give this addition or subtraction the same cost as
> +     a GPR move.  We could parameterize this if necessary.  */
> +  if (aarch64_count_above_hard_fp_saves (callee_saved_regs) == 0)
> +    return aarch64_tune_params.regmove_cost->GP2GP;
> +  return 0;
> +}
> +
>  /* Implement TARGET_INSN_COST.  We have the opportunity to do something
>     much more productive here, such as using insn attributes to cost things.
>     But we don't, not yet.
> @@ -31557,6 +31669,12 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_MEMORY_MOVE_COST
>  #define TARGET_MEMORY_MOVE_COST aarch64_memory_move_cost
>
> +#undef TARGET_CALLEE_SAVE_COST
> +#define TARGET_CALLEE_SAVE_COST aarch64_callee_save_cost
> +
> +#undef TARGET_FRAME_ALLOCATION_COST
> +#define TARGET_FRAME_ALLOCATION_COST aarch64_frame_allocation_cost
> +
>  #undef TARGET_MIN_DIVISIONS_FOR_RECIP_MUL
>  #define TARGET_MIN_DIVISIONS_FOR_RECIP_MUL 
> aarch64_min_divisions_for_recip_mul
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index fb93a6fdd0a..661e71b032c 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -20600,12 +20600,27 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
>    return false;
>  }
>
> -/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> +/* Implement TARGET_CALLEE_SAVE_COST.  */
>
>  static int
> -ix86_ira_callee_saved_register_cost_scale (int)
> -{
> -  return 1;
> +ix86_callee_save_cost (spill_cost_type, unsigned int hard_regno, 
> machine_mode,
> +                      unsigned int, int mem_cost, const HARD_REG_SET &, bool)
> +{
> +  /* Account for the fact that push and pop are shorter and do their
> +     own allocation and deallocation.  */
> +  if (GENERAL_REGNO_P (hard_regno))
> +    {
> +      /* push is 1 byte while typical spill is 4-5 bytes.
> +        ??? We probably should adjust size costs accordingly.
> +        Costs are relative to reg-reg move that has 2 bytes for 32bit
> +        and 3 bytes otherwise.  */
> +      if (optimize_function_for_size_p (cfun))
> +       return 1;
> +      /* Be sure that no cost table sets cost to 2, so we end up with 0.  */
> +      gcc_checking_assert (mem_cost > 2);
> +      return mem_cost - 2;
> +    }
> +  return mem_cost;
>  }
>
>  /* Return true if a set of DST by the expression SRC should be allowed.
> @@ -27092,9 +27107,8 @@ ix86_libgcc_floating_mode_supported_p
>  #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true
>  #undef TARGET_CLASS_LIKELY_SPILLED_P
>  #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
> -#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> -#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
> -  ix86_ira_callee_saved_register_cost_scale
> +#undef TARGET_CALLEE_SAVE_COST
> +#define TARGET_CALLEE_SAVE_COST ix86_callee_save_cost
>
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 9f42913a4ef..a96700c0d38 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -3047,14 +3047,6 @@ A target hook which can change allocno class for given 
> pseudo from
>    The default version of this target hook always returns given class.
>  @end deftypefn
>
> -@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE 
> (int @var{hard_regno})
> -A target hook which returns the callee-saved register @var{hard_regno}
> -cost scale in epilogue and prologue used by IRA.
> -
> -The default version of this target hook returns 1 if optimizing for
> -size, otherwise returns the entry block frequency.
> -@end deftypefn
> -
>  @deftypefn {Target Hook} bool TARGET_LRA_P (void)
>  A target hook which returns true if we use LRA instead of reload pass.
>
> @@ -7011,6 +7003,75 @@ value to the result of that function.  The arguments 
> to that function
>  are the same as to this target hook.
>  @end deftypefn
>
> +@deftypefn {Target Hook} int TARGET_CALLEE_SAVE_COST (spill_cost_type 
> @var{cost_type}, unsigned int @var{hard_regno}, machine_mode @var{mode}, 
> unsigned int @var{nregs}, int @var{mem_cost}, const HARD_REG_SET 
> @var{&allocated_callee_regs}, bool @var{existing_spills_p})
> +Return the one-off cost of saving or restoring callee-saved registers
> +(also known as call-preserved registers or non-volatile registers).
> +The parameters are as follows:
> +
> +@itemize
> +@item
> +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register
> +and @samp{spill_cost_type::RESTORE} for restoring a register.
> +
> +@item
> +@var{hard_regno} and @var{mode} represent the whole register that
> +the register allocator is considering using; of these,
> +@var{nregs} registers are fully or partially callee-saved.
> +
> +@item
> +@var{mem_cost} is the normal cost for storing (for saves)
> +or loading (for restores) the @var{nregs} registers.
> +
> +@item
> +@var{allocated_callee_regs} is the set of callee-saved registers
> +that are already in use.
> +
> +@item
> +@var{existing_spills_p} is true if the register allocator has
> +already decided to spill registers to memory.
> +@end itemize
> +
> +If @var{existing_spills_p} is false, the cost of a save should account
> +for frame allocations in a way that is consistent with
> +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for spills.
> +Similarly, the cost of a restore should then account for frame deallocations
> +in a way that is consistent with @code{TARGET_FRAME_ALLOCATION_COST}'s
> +handling of deallocations.
> +
> +Note that this hook should not attempt to apply a frequency scale
> +to the cost: it is the caller's responsibility to do that where
> +appropriate.
> +
> +The default implementation returns @var{mem_cost}, plus the allocation
> +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST},
> +where appropriate.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} int TARGET_FRAME_ALLOCATION_COST (frame_cost_type 
> @var{cost_type}, const HARD_REG_SET @var{&allocated_callee_regs})
> +Return the cost of allocating or deallocating a frame for the sake of
> +a spill; @var{cost_type} chooses between allocation and deallocation.
> +The term ``spill'' here includes both forcing a pseudo register to memory
> +and using caller-saved registers for pseudo registers that are live across
> +a call.
> +
> +This hook is only called if the register allocator has not so far
> +decided to spill.  The allocator may have decided to use callee-saved
> +registers; if so, @var{allocated_callee_regs} is the set of callee-saved
> +registers that the allocator has used.  There might also be other reasons
> +why a stack frame is already needed; for example, @samp{get_frame_size ()}
> +might be nonzero, or the target might already require a frame for
> +target-specific reasons.
> +
> +When the register allocator uses this hook to cost spills, it also uses
> +@code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved registers, passing
> +@samp{false} as the @var{existing_spills_p} argument.  The intention is to
> +allow the target to apply an apples-for-apples comparison between the
> +cost of using callee-saved registers and using spills in cases where the
> +allocator has not yet committed to using both strategies.
> +
> +The default implementation returns 0.
> +@end deftypefn
> +
>  @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p})
>  A C expression for the cost of a branch instruction.  A value of 1 is
>  the default; other values are interpreted relative to that. Parameter
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 6dbe22581ca..eccc4d88493 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -2388,8 +2388,6 @@ in the reload pass.
>
>  @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
>
> -@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> -
>  @hook TARGET_LRA_P
>
>  @hook TARGET_REGISTER_PRIORITY
> @@ -4584,6 +4582,10 @@ These macros are obsolete, new ports should use the 
> target hook
>
>  @hook TARGET_MEMORY_MOVE_COST
>
> +@hook TARGET_CALLEE_SAVE_COST
> +
> +@hook TARGET_FRAME_ALLOCATION_COST
> +
>  @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p})
>  A C expression for the cost of a branch instruction.  A value of 1 is
>  the default; other values are interpreted relative to that. Parameter
> diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
> index 48025d202b6..0d03aed5128 100644
> --- a/gcc/hard-reg-set.h
> +++ b/gcc/hard-reg-set.h
> @@ -191,6 +191,12 @@ hard_reg_set_empty_p (const_hard_reg_set x)
>    return x == HARD_CONST (0);
>  }
>
> +inline int
> +hard_reg_set_popcount (const_hard_reg_set x)
> +{
> +  return popcount_hwi (x);
> +}
> +
>  #else
>
>  inline void
> @@ -254,6 +260,15 @@ hard_reg_set_empty_p (const_hard_reg_set x)
>      bad |= x.elts[i];
>    return bad == 0;
>  }
> +
> +inline int
> +hard_reg_set_popcount (const_hard_reg_set x)
> +{
> +  int count = 0;
> +  for (unsigned int i = 0; i < ARRAY_SIZE (x.elts); ++i)
> +    count += popcount_hwi (x.elts[i]);
> +  return count;
> +}
>  #endif
>
>  /* Iterator for hard register sets.  */
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 233060e1587..4b9296029cc 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -1195,10 +1195,16 @@ finish_update_cost_records (void)
>    update_cost_record_pool.release ();
>  }
>
> +/* True if we have allocated memory, or intend to do so.  */
> +static bool allocated_memory_p;
> +
>  /* Array whose element value is TRUE if the corresponding hard
>     register was already allocated for an allocno.  */
>  static bool allocated_hardreg_p[FIRST_PSEUDO_REGISTER];
>
> +/* Which callee-saved hard registers we've decided to save.  */
> +static HARD_REG_SET allocated_callee_save_regs;
> +
>  /* Describes one element in a queue of allocnos whose costs need to be
>     updated.  Each allocno in the queue is known to have an allocno
>     class.  */
> @@ -1740,6 +1746,20 @@ check_hard_reg_p (ira_allocno_t a, int hard_regno,
>    return j == nregs;
>  }
>
> +/* Record that we have allocated NREGS registers starting at HARD_REGNO.  */
> +
> +static void
> +record_allocation (int hard_regno, int nregs)
> +{
> +  for (int i = 0; i < nregs; ++i)
> +    if (!allocated_hardreg_p[hard_regno + i])
> +      {
> +       allocated_hardreg_p[hard_regno + i] = true;
> +       if (!crtl->abi->clobbers_full_reg_p (hard_regno + i))
> +         SET_HARD_REG_BIT (allocated_callee_save_regs, hard_regno + i);
> +      }
> +}
> +
>  /* Return number of registers needed to be saved and restored at
>     function prologue/epilogue if we allocate HARD_REGNO to hold value
>     of MODE.  */
> @@ -1961,6 +1981,12 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>  #endif
>    auto_bitmap allocnos_to_spill;
>    HARD_REG_SET soft_conflict_regs = {};
> +  int entry_freq = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +  int exit_freq = REG_FREQ_FROM_BB (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +  int spill_cost = 0;
> +  /* Whether we have spilled pseudos or used caller-saved registers for 
> values
> +     that are live across a call.  */
> +  bool existing_spills_p = allocated_memory_p || caller_save_needed;
>
>    ira_assert (! ALLOCNO_ASSIGNED_P (a));
>    get_conflict_and_start_profitable_regs (a, retry_p,
> @@ -1979,6 +2005,18 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>      start_update_cost ();
>    mem_cost += ALLOCNO_UPDATED_MEMORY_COST (a);
>
> +  if (!existing_spills_p)
> +    {
> +      auto entry_cost = targetm.frame_allocation_cost
> +       (frame_cost_type::ALLOCATION, allocated_callee_save_regs);
> +      spill_cost += entry_cost * entry_freq;
> +
> +      auto exit_cost = targetm.frame_allocation_cost
> +       (frame_cost_type::DEALLOCATION, allocated_callee_save_regs);
> +      spill_cost += exit_cost * exit_freq;
> +    }
> +  mem_cost += spill_cost;
> +
>    ira_allocate_and_copy_costs (&ALLOCNO_UPDATED_HARD_REG_COSTS (a),
>                                aclass, ALLOCNO_HARD_REG_COSTS (a));
>    a_costs = ALLOCNO_UPDATED_HARD_REG_COSTS (a);
> @@ -2175,16 +2213,37 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>           /* We need to save/restore the hard register in
>              epilogue/prologue.  Therefore we increase the cost.  */
>           {
> +           int nregs = hard_regno_nregs (hard_regno, mode);
> +           add_cost = 0;
>             rclass = REGNO_REG_CLASS (hard_regno);
> -           add_cost = ((ira_memory_move_cost[mode][rclass][0]
> -                        + ira_memory_move_cost[mode][rclass][1])
> -                       * saved_nregs / hard_regno_nregs (hard_regno,
> -                                                         mode) - 1)
> -                      * targetm.ira_callee_saved_register_cost_scale 
> (hard_regno);
> +
> +           auto entry_cost = targetm.callee_save_cost
> +             (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> +              ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> +              allocated_callee_save_regs, existing_spills_p);
> +           /* In the event of a tie between caller-save and callee-save,
> +              prefer callee-save.  We apply this to the entry cost rather
> +              than the exit cost since the entry frequency must be at
> +              least as high as the exit frequency.  */
> +           if (entry_cost > 1)
> +             entry_cost -= 1;
> +           add_cost += entry_cost * entry_freq;
> +
> +           auto exit_cost = targetm.callee_save_cost
> +             (spill_cost_type::RESTORE, hard_regno, mode, saved_nregs,
> +              ira_memory_move_cost[mode][rclass][1] * saved_nregs / nregs,
> +              allocated_callee_save_regs, existing_spills_p);
> +           add_cost += exit_cost * exit_freq;
> +
>             cost += add_cost;
>             full_cost += add_cost;
>           }
>         }
> +      if (ira_need_caller_save_p (a, hard_regno))
> +       {
> +         cost += spill_cost;
> +         full_cost += spill_cost;
> +       }
>        if (min_cost > cost)
>         min_cost = cost;
>        if (min_full_cost > full_cost)
> @@ -2211,11 +2270,13 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>   fail:
>    if (best_hard_regno >= 0)
>      {
> -      for (i = hard_regno_nregs (best_hard_regno, mode) - 1; i >= 0; i--)
> -       allocated_hardreg_p[best_hard_regno + i] = true;
> +      record_allocation (best_hard_regno,
> +                        hard_regno_nregs (best_hard_regno, mode));
>        spill_soft_conflicts (a, allocnos_to_spill, soft_conflict_regs,
>                             best_hard_regno);
>      }
> +  else
> +    allocated_memory_p = true;
>    if (! retry_p)
>      restore_costs_from_copies (a);
>    ALLOCNO_HARD_REGNO (a) = best_hard_regno;
> @@ -3368,8 +3429,7 @@ improve_allocation (void)
>        /* Assign the best chosen hard register to A.  */
>        ALLOCNO_HARD_REGNO (a) = best;
>
> -      for (j = nregs - 1; j >= 0; j--)
> -       allocated_hardreg_p[best + j] = true;
> +      record_allocation (best, nregs);
>
>        if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
>         fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
> @@ -5199,6 +5259,7 @@ color (void)
>  {
>    allocno_stack_vec.create (ira_allocnos_num);
>    memset (allocated_hardreg_p, 0, sizeof (allocated_hardreg_p));
> +  CLEAR_HARD_REG_SET (allocated_callee_save_regs);
>    ira_initiate_assign ();
>    do_coloring ();
>    ira_finish_assign ();
> @@ -5327,10 +5388,14 @@ ira_color (void)
>    ira_allocno_iterator ai;
>
>    /* Setup updated costs.  */
> +  allocated_memory_p = false;
>    FOR_EACH_ALLOCNO (a, ai)
>      {
>        ALLOCNO_UPDATED_MEMORY_COST (a) = ALLOCNO_MEMORY_COST (a);
>        ALLOCNO_UPDATED_CLASS_COST (a) = ALLOCNO_CLASS_COST (a);
> +      if (ALLOCNO_CLASS (a) == NO_REGS
> +         && !ira_equiv_no_lvalue_p (ALLOCNO_REGNO (a)))
> +       allocated_memory_p = true;
>      }
>    if (ira_conflicts_p)
>      color ();
> diff --git a/gcc/target.def b/gcc/target.def
> index c348b15815a..6c7cdc8126b 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -3775,6 +3775,81 @@ are the same as to this target hook.",
>   int, (machine_mode mode, reg_class_t rclass, bool in),
>   default_memory_move_cost)
>
> +DEFHOOK
> +(callee_save_cost,
> + "Return the one-off cost of saving or restoring callee-saved registers\n\
> +(also known as call-preserved registers or non-volatile registers).\n\
> +The parameters are as follows:\n\
> +\n\
> +@itemize\n\
> +@item\n\
> +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register\n\
> +and @samp{spill_cost_type::RESTORE} for restoring a register.\n\
> +\n\
> +@item\n\
> +@var{hard_regno} and @var{mode} represent the whole register that\n\
> +the register allocator is considering using; of these,\n\
> +@var{nregs} registers are fully or partially callee-saved.\n\
> +\n\
> +@item\n\
> +@var{mem_cost} is the normal cost for storing (for saves)\n\
> +or loading (for restores) the @var{nregs} registers.\n\
> +\n\
> +@item\n\
> +@var{allocated_callee_regs} is the set of callee-saved registers\n\
> +that are already in use.\n\
> +\n\
> +@item\n\
> +@var{existing_spills_p} is true if the register allocator has\n\
> +already decided to spill registers to memory.\n\
> +@end itemize\n\
> +\n\
> +If @var{existing_spills_p} is false, the cost of a save should account\n\
> +for frame allocations in a way that is consistent with\n\
> +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for spills.\n\
> +Similarly, the cost of a restore should then account for frame 
> deallocations\n\
> +in a way that is consistent with @code{TARGET_FRAME_ALLOCATION_COST}'s\n\
> +handling of deallocations.\n\
> +\n\
> +Note that this hook should not attempt to apply a frequency scale\n\
> +to the cost: it is the caller's responsibility to do that where\n\
> +appropriate.\n\
> +\n\
> +The default implementation returns @var{mem_cost}, plus the allocation\n\
> +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST},\n\
> +where appropriate.",
> + int, (spill_cost_type cost_type, unsigned int hard_regno,
> +       machine_mode mode, unsigned int nregs, int mem_cost,
> +       const HARD_REG_SET &allocated_callee_regs, bool existing_spills_p),
> + default_callee_save_cost)
> +
> +DEFHOOK
> +(frame_allocation_cost,
> + "Return the cost of allocating or deallocating a frame for the sake of\n\
> +a spill; @var{cost_type} chooses between allocation and deallocation.\n\
> +The term ``spill'' here includes both forcing a pseudo register to memory\n\
> +and using caller-saved registers for pseudo registers that are live across\n\
> +a call.\n\
> +\n\
> +This hook is only called if the register allocator has not so far\n\
> +decided to spill.  The allocator may have decided to use callee-saved\n\
> +registers; if so, @var{allocated_callee_regs} is the set of callee-saved\n\
> +registers that the allocator has used.  There might also be other reasons\n\
> +why a stack frame is already needed; for example, @samp{get_frame_size ()}\n\
> +might be nonzero, or the target might already require a frame for\n\
> +target-specific reasons.\n\
> +\n\
> +When the register allocator uses this hook to cost spills, it also uses\n\
> +@code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved registers, passing\n\
> +@samp{false} as the @var{existing_spills_p} argument.  The intention is to\n\
> +allow the target to apply an apples-for-apples comparison between the\n\
> +cost of using callee-saved registers and using spills in cases where the\n\
> +allocator has not yet committed to using both strategies.\n\
> +\n\
> +The default implementation returns 0.",
> + int, (frame_cost_type cost_type, const HARD_REG_SET &allocated_callee_regs),
> + default_frame_allocation_cost)
> +
>  DEFHOOK
>  (use_by_pieces_infrastructure_p,
>   "GCC will attempt several strategies when asked to copy between\n\
> @@ -5714,18 +5789,6 @@ DEFHOOK
>   reg_class_t, (int, reg_class_t, reg_class_t),
>   default_ira_change_pseudo_allocno_class)
>
> -/* Scale of callee-saved register cost in epilogue and prologue used by
> -   IRA.  */
> -DEFHOOK
> -(ira_callee_saved_register_cost_scale,
> - "A target hook which returns the callee-saved register @var{hard_regno}\n\
> -cost scale in epilogue and prologue used by IRA.\n\
> -\n\
> -The default version of this target hook returns 1 if optimizing for\n\
> -size, otherwise returns the entry block frequency.",
> - int, (int hard_regno),
> - default_ira_callee_saved_register_cost_scale)
> -
>  /* Return true if we use LRA instead of reload.  */
>  DEFHOOK
>  (lra_p,
> diff --git a/gcc/target.h b/gcc/target.h
> index 3e1ee68a341..2bf35e2d0ee 100644
> --- a/gcc/target.h
> +++ b/gcc/target.h
> @@ -284,6 +284,18 @@ enum poly_value_estimate_kind
>    POLY_VALUE_LIKELY
>  };
>
> +enum class spill_cost_type
> +{
> +  SAVE,
> +  RESTORE
> +};
> +
> +enum class frame_cost_type
> +{
> +  ALLOCATION,
> +  DEALLOCATION
> +};
> +
>  typedef void (*emit_support_tinfos_callback) (tree);
>
>  extern bool verify_type_context (location_t, type_context_kind, const_tree,
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index 344075efa41..c79458e374e 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -2083,6 +2083,33 @@ default_register_move_cost (machine_mode mode 
> ATTRIBUTE_UNUSED,
>  #endif
>  }
>
> +/* The default implementation of TARGET_CALLEE_SAVE_COST.  */
> +
> +int
> +default_callee_save_cost (spill_cost_type spill_type, unsigned int,
> +                         machine_mode, unsigned int, int mem_cost,
> +                         const HARD_REG_SET &callee_saved_regs,
> +                         bool existing_spills_p)
> +{
> +  if (!existing_spills_p)
> +    {
> +      auto frame_type = (spill_type == spill_cost_type::SAVE
> +                        ? frame_cost_type::ALLOCATION
> +                        : frame_cost_type::DEALLOCATION);
> +      mem_cost += targetm.frame_allocation_cost (frame_type,
> +                                                callee_saved_regs);
> +    }
> +  return mem_cost;
> +}
> +
> +/* The default implementation of TARGET_FRAME_ALLOCATION_COST.  */
> +
> +int
> +default_frame_allocation_cost (frame_cost_type, const HARD_REG_SET &)
> +{
> +  return 0;
> +}
> +
>  /* The default implementation of TARGET_SLOW_UNALIGNED_ACCESS.  */
>
>  bool
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 8871e01430c..f16b58798c2 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -235,6 +235,11 @@ extern tree default_builtin_tm_load_store (tree);
>  extern int default_memory_move_cost (machine_mode, reg_class_t, bool);
>  extern int default_register_move_cost (machine_mode, reg_class_t,
>                                        reg_class_t);
> +extern int default_callee_save_cost (spill_cost_type, unsigned int,
> +                                    machine_mode, unsigned int, int,
> +                                    const HARD_REG_SET &, bool);
> +extern int default_frame_allocation_cost (frame_cost_type,
> +                                         const HARD_REG_SET &);
>  extern bool default_slow_unaligned_access (machine_mode, unsigned int);
>  extern HOST_WIDE_INT default_estimated_poly_value (poly_int64,
>                                                    poly_value_estimate_kind);
> diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_1.c 
> b/gcc/testsuite/gcc.target/aarch64/callee_save_1.c
> new file mode 100644
> index 00000000000..f28486112f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_1.c
> @@ -0,0 +1,12 @@
> +/* { dg-options "-O2" } */
> +
> +int test (int x), test2 (int x);
> +
> +int foo (int x, int y) {
> +    test (x);
> +    int lhs = test2 (y);
> +    return x + lhs;
> +}
> +
> +/* { dg-final { scan-assembler {\tstp\tx19, x20, \[sp,} } } */
> +/* { dg-final { scan-assembler {\tldp\tx19, x20, \[sp,} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_2.c 
> b/gcc/testsuite/gcc.target/aarch64/callee_save_2.c
> new file mode 100644
> index 00000000000..744b464be2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_2.c
> @@ -0,0 +1,14 @@
> +/* { dg-options "-O2 -fomit-frame-pointer" } */
> +
> +int test (int x), test2 (int x);
> +
> +int foo (int x, int y) {
> +    test (x);
> +    int lhs = test2 (y);
> +    return x + lhs;
> +}
> +
> +/* { dg-final { scan-assembler {\tstp\tx30, x19, \[sp,} } } */
> +/* { dg-final { scan-assembler {\tldp\tx30, x19, \[sp\],} } } */
> +/* { dg-final { scan-assembler {\tstr\tw1, \[sp,} } } */
> +/* { dg-final { scan-assembler {\tldr\tw0, \[sp,} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_3.c 
> b/gcc/testsuite/gcc.target/aarch64/callee_save_3.c
> new file mode 100644
> index 00000000000..50b6853e4ee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_3.c
> @@ -0,0 +1,12 @@
> +/* { dg-options "-O2" } */
> +
> +float test ();
> +float g;
> +
> +float foo (float x, float y) {
> +  g = x + test ();
> +  return (x + test ()) * y;
> +}
> +
> +/* { dg-final { scan-assembler {\tstp\td14, d15, \[sp,} } } */
> +/* { dg-final { scan-assembler {\tldp\td14, d15, \[sp,} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr103350-1.c 
> b/gcc/testsuite/gcc.target/aarch64/pr103350-1.c
> index a0e764e8653..129c6ac90e0 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr103350-1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr103350-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run { target le } } */
> -/* { dg-additional-options "-Os -fno-tree-ter -save-temps -fdump-rtl-ree-all 
> -free -std=c99 -w" } */
> +/* { dg-additional-options "-Os -fno-tree-ter -save-temps -fdump-rtl-ree-all 
> -free -std=c99 -w -fno-caller-saves" } */
>
>  typedef unsigned char u8;
>  typedef unsigned char __attribute__((__vector_size__ (8))) v64u8;
> --
> 2.25.1
>

Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

Reply via email to