Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Improve rematerialization costs of addresses. The current costs are set too > high > which results in extra register pressure and spilling. Using lower costs > means > addresses will be rematerialized more often rather than being spilled or > causing > spills. This results in significant codesize reductions and performance > gains. > SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO. Codesize is > 0.12% > smaller.
I'm not questioning the results, but I think we need to look in more detail why rematerialisation requires such low costs. The point of comparison should be against a spill and reload, so any constant that is as cheap as a load should be rematerialised. If that isn't happening then it sounds like changes are needed elsewhere. Thanks, Richard > Passes bootstrap and regress. OK for commit? > > ChangeLog: > 2021-06-01 Wilco Dijkstra <wdijk...@arm.com> > > * config/aarch64/aarch64.cc (aarch64_rtx_costs): Use better > rematerialization > costs for HIGH, LO_SUM and SYMREF. > > --- > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > 43d87d1b9c4ef1a85094e51f81745f98f1ef27fb..7341849121ffd6b3b0b77c9730e74e751742e852 > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -14529,45 +14529,28 @@ cost_plus: > return false; /* All arguments need to be in registers. */ > } > > + /* The following costs are used for rematerialization of addresses. > + Set a low cost for all global accesses - this ensures they are > + preferred for rematerialization, blocks them from being spilled > + and reduces register pressure. The result is significant codesize > + reductions and performance gains. */ > + > case SYMBOL_REF: > + *cost = 0; > > - if (aarch64_cmodel == AARCH64_CMODEL_LARGE > - || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC) > - { > - /* LDR. */ > - if (speed) > - *cost += extra_cost->ldst.load; > - } > - else if (aarch64_cmodel == AARCH64_CMODEL_SMALL > - || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC) > - { > - /* ADRP, followed by ADD. */ > - *cost += COSTS_N_INSNS (1); > - if (speed) > - *cost += 2 * extra_cost->alu.arith; > - } > - else if (aarch64_cmodel == AARCH64_CMODEL_TINY > - || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC) > - { > - /* ADR. */ > - if (speed) > - *cost += extra_cost->alu.arith; > - } > + /* Use a separate remateralization cost for GOT accesses. */ > + if (aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC > + && aarch64_classify_symbol (x, 0) == SYMBOL_SMALL_GOT_4G) > + *cost = COSTS_N_INSNS (1) / 2; > > - if (flag_pic) > - { > - /* One extra load instruction, after accessing the GOT. */ > - *cost += COSTS_N_INSNS (1); > - if (speed) > - *cost += extra_cost->ldst.load; > - } > return true; > > case HIGH: > + *cost = 0; > + return true; > + > case LO_SUM: > - /* ADRP/ADD (immediate). */ > - if (speed) > - *cost += extra_cost->alu.arith; > + *cost = COSTS_N_INSNS (3) / 4; > return true; > > case ZERO_EXTRACT: