Wilco Dijkstra <wilco.dijks...@arm.com> writes: > ping Can you fold in the rtx costs part of the original GOT relaxation patch?
I don't think there's enough information here for me to be able to review the patch though. I'll need to find testcases, look in detail at what the rtl passes are doing, and try to work out whether (and why) this is a good way of fixing things. I don't mind doing that, but I don't think I'll have time before stage 3. Thanks, Richard > > > From: Wilco Dijkstra > Sent: 02 June 2021 11:21 > To: GCC Patches <gcc-patches@gcc.gnu.org> > Cc: Kyrylo Tkachov <kyrylo.tkac...@arm.com>; Richard Sandiford > <richard.sandif...@arm.com> > Subject: [PATCH] AArch64: Improve address rematerialization costs > > Hi, > > Given the large improvements from better register allocation of GOT accesses, > I decided to generalize it to get large gains for normal addressing too: > > Improve rematerialization costs of addresses. The current costs are set too > high > which results in extra register pressure and spilling. Using lower costs > means > addresses will be rematerialized more often rather than being spilled or > causing > spills. This results in significant codesize reductions and performance > gains. > SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO. Codesize is > 0.12% > smaller. > > Passes bootstrap and regress. OK for commit? > > ChangeLog: > 2021-06-01 Wilco Dijkstra <wdijk...@arm.com> > > * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better > rematerialization > costs for HIGH, LO_SUM and SYMBOL_REF. > > --- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -13444,45 +13444,22 @@ cost_plus: > return false; /* All arguments need to be in registers. */ > } > > - case SYMBOL_REF: > + /* The following costs are used for rematerialization of addresses. > + Set a low cost for all global accesses - this ensures they are > + preferred for rematerialization, blocks them from being spilled > + and reduces register pressure. The result is significant codesize > + reductions and performance gains. */ > > - if (aarch64_cmodel == AARCH64_CMODEL_LARGE > - || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC) > - { > - /* LDR. */ > - if (speed) > - *cost += extra_cost->ldst.load; > - } > - else if (aarch64_cmodel == AARCH64_CMODEL_SMALL > - || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC) > - { > - /* ADRP, followed by ADD. */ > - *cost += COSTS_N_INSNS (1); > - if (speed) > - *cost += 2 * extra_cost->alu.arith; > - } > - else if (aarch64_cmodel == AARCH64_CMODEL_TINY > - || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC) > - { > - /* ADR. */ > - if (speed) > - *cost += extra_cost->alu.arith; > - } > - > - if (flag_pic) > - { > - /* One extra load instruction, after accessing the GOT. */ > - *cost += COSTS_N_INSNS (1); > - if (speed) > - *cost += extra_cost->ldst.load; > - } > + case SYMBOL_REF: > + *cost = 0; > return true; > > case HIGH: > + *cost = 0; > + return true; > + > case LO_SUM: > - /* ADRP/ADD (immediate). */ > - if (speed) > - *cost += extra_cost->alu.arith; > + *cost = COSTS_N_INSNS (3) / 4; > return true; > > case ZERO_EXTRACT: