Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> Improve rematerialization costs of addresses.  The current costs are set too 
> high
> which results in extra register pressure and spilling.  Using lower costs 
> means
> addresses will be rematerialized more often rather than being spilled or 
> causing
> spills.  This results in significant codesize reductions and performance 
> gains.
> SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 
> 0.12%
> smaller.

I'm not questioning the results, but I think we need to look in more
detail why rematerialisation requires such low costs.  The point of
comparison should be against a spill and reload, so any constant
that is as cheap as a load should be rematerialised.  If that isn't
happening then it sounds like changes are needed elsewhere.

Thanks,
Richard

> Passes bootstrap and regress. OK for commit?
>
> ChangeLog:
> 2021-06-01  Wilco Dijkstra  <wdijk...@arm.com>
>
>         * config/aarch64/aarch64.cc (aarch64_rtx_costs): Use better 
> rematerialization
>         costs for HIGH, LO_SUM and SYMREF.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 43d87d1b9c4ef1a85094e51f81745f98f1ef27fb..7341849121ffd6b3b0b77c9730e74e751742e852
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -14529,45 +14529,28 @@ cost_plus:
>           return false;  /* All arguments need to be in registers.  */
>         }
>
> +    /* The following costs are used for rematerialization of addresses.
> +       Set a low cost for all global accesses - this ensures they are
> +       preferred for rematerialization, blocks them from being spilled
> +       and reduces register pressure.  The result is significant codesize
> +       reductions and performance gains. */
> +
>      case SYMBOL_REF:
> +      *cost = 0;
>
> -      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
> -         || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
> -       {
> -         /* LDR.  */
> -         if (speed)
> -           *cost += extra_cost->ldst.load;
> -       }
> -      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
> -              || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
> -       {
> -         /* ADRP, followed by ADD.  */
> -         *cost += COSTS_N_INSNS (1);
> -         if (speed)
> -           *cost += 2 * extra_cost->alu.arith;
> -       }
> -      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
> -              || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
> -       {
> -         /* ADR.  */
> -         if (speed)
> -           *cost += extra_cost->alu.arith;
> -       }
> +      /* Use a separate remateralization cost for GOT accesses.  */
> +      if (aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC
> +         && aarch64_classify_symbol (x, 0) == SYMBOL_SMALL_GOT_4G)
> +       *cost = COSTS_N_INSNS (1) / 2;
>
> -      if (flag_pic)
> -       {
> -         /* One extra load instruction, after accessing the GOT.  */
> -         *cost += COSTS_N_INSNS (1);
> -         if (speed)
> -           *cost += extra_cost->ldst.load;
> -       }
>        return true;
>
>      case HIGH:
> +      *cost = 0;
> +      return true;
> +
>      case LO_SUM:
> -      /* ADRP/ADD (immediate).  */
> -      if (speed)
> -       *cost += extra_cost->alu.arith;
> +      *cost = COSTS_N_INSNS (3) / 4;
>        return true;
>
>      case ZERO_EXTRACT:

Reply via email to