On Fri, Aug 25, 2017 at 10:43 AM, Charles Baylis <charles.bay...@linaro.org> wrote: > On 9 June 2017 at 15:13, Richard Earnshaw (lists) > <richard.earns...@arm.com> wrote: >> On 21/02/17 16:54, charles.bay...@linaro.org wrote: >>> From: Charles Baylis <charles.bay...@linaro.org> >>> >>> This patch adds support for modelling the varying costs of >>> different addressing modes. The generic cost table treats >>> all addressing modes as having equal cost. The cost table >>> for Cortex-A57 is derived from >>> http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf >>> and treats addressing modes with write-back as having a >>> cost equal to one additional instruction. >>> >>> gcc/ChangeLog: >>> >>> <date> Charles Baylis <charles.bay...@linaro.org> >>> >>> * config/arm/aarch-common-protos.h (enum arm_addr_mode_op): New. >>> (struct addr_mode_cost_table): New. >>> (struct cpu_cost_table): Add pointer to an addr_mode_cost_table. >>> * config/arm/aarch-cost-tables.h: (generic_addr_mode_costs): New. >>> (generic_extra_costs) Initialise aarch32_addr_mode. >>> (cortexa53_extra_costs) Likewise. >>> (addr_mode_costs_cortexa57) New. >>> (cortexa57_extra_costs) Initialise aarch32_addr_mode. >>> (exynosm1_extra_costs) Likewise. >>> (xgene1_extra_costs) Likewise. >>> (qdf24xx_extra_costs) Likewise. >>> * config/arm/arm.c (cortexa9_extra_costs) Initialise >>> aarch32_addr_mode. >>> (cortexa9_extra_costs) Likewise. >>> (cortexa8_extra_costs) Likewise. >>> (cortexa5_extra_costs) Likewise. >>> (cortexa7_extra_costs) Likewise. >>> (cortexa12_extra_costs) Likewise. >>> (cortexv7m_extra_costs) Likewise. >>> (arm_mem_costs): Use table lookup to calculate cost of addressing >>> mode. >>> >>> Change-Id: If71bd7c4f4bb876c5ed82dc28791130efb8bf89e >>> --- >>> gcc/config/arm/aarch-common-protos.h | 16 +++++++++++ >>> gcc/config/arm/aarch-cost-tables.h | 54 >>> ++++++++++++++++++++++++++++++---- >>> gcc/config/arm/arm.c | 56 >>> ++++++++++++++++++++++++++++++------ >>> 3 files changed, 113 insertions(+), 13 deletions(-) >>> >>> diff --git a/gcc/config/arm/aarch-common-protos.h >>> b/gcc/config/arm/aarch-common-protos.h >>> index 7c2bb4c..f6fcc94 100644 >>> --- a/gcc/config/arm/aarch-common-protos.h >>> +++ b/gcc/config/arm/aarch-common-protos.h >>> @@ -130,6 +130,21 @@ struct vector_cost_table >>> const int alu; >>> }; >>> >>> +enum arm_addr_mode_op >>> +{ >>> + AMO_DEFAULT, >>> + AMO_NO_WB, >>> + AMO_WB, >>> + AMO_MAX /* for array size */ >> >> Comment style: Capital letter at start, full stop and two spaces at the end. > > Done. > >> The enum and structure below should have some comments explaining what >> they are for. > > Done. > >>> +const struct addr_mode_cost_table generic_addr_mode_costs = >>> +{ >>> + /* int */ >>> + { 0, 0, 0 }, >> >> Elements need to be commented, otherwise minor changes to the contents >> of the arrays is hard to update and maintain. > > Done. > > >>> + /* Addressing mode */ >> >> Comment style. > > Done. > >>> - /* TODO: Add table-driven costs for addressing modes. */ >>> + arm_addr_mode_op op_type; >>> + switch (GET_CODE(XEXP (x, 0))) >>> + { >>> + case REG: >>> + default: >>> + op_type = AMO_DEFAULT; >> >> Why would REG and default share the same cost? Presumably default is >> too complex to recognize, but then it's unlikely to be cheap. > > Default covers literals in various forms of RTL, for which the cost is > the same as regular, and PIC, which is handled in the original code > above this section. > >>> + break; >>> + case PLUS: >>> + case MINUS: >>> + op_type = AMO_NO_WB; >> >> GCC doesn't support MINUS in addresses, even though the architecture >> could in theory handle this. > > I've noted that in a comment, but kept the "case MINUS:" in place. > >> Also, I think you need a different cost for scaled offset addressing, >> and possibly even for different scaling factors and types of scaling. > > ... see below: > >>> + break; >>> + case PRE_INC: >>> + case PRE_DEC: >>> + case POST_INC: >>> + case POST_DEC: >>> + case PRE_MODIFY: >>> + case POST_MODIFY: >>> + op_type = AMO_WB; >> >> Pre and post might also need separate entries (plus further entries for >> scaling). A post operation might happen in parallel with the data >> fetch, while a pre operation must happen before the address can be sent >> to the load/store pipe. > > The {DEFAULT, NO_WB, WB} range is also Ramana's requested design. I > think this is OK because it is sufficient to describe the currently > publicly documented microarchitectures, and the structure is readily > extensible if future cores require more detailed modelling. > > Expanding the range of AMO_* entries in the array makes the tables > more unwieldy for no immediate benefit. > >>> + break; >>> + } >> >> blank line between the switch block and the subsequent statements. >> Also, the following needs to laid out in GNU style. > > Done. > > This version of the patch also moves the Cortex-A57 tuning (which was > present in the earlier version) into a separate patch. Discussion of > that in a follow-up post.
This will break aarch64 and the cores which are aarch64 only. Also your changelog is incorrect here: (qdf24xx_extra_costs) Likewise. Because qdf24xx_extra_costs has been moved to aarch64/aarch64-cost-tables.h with the other 2 aarch64 only processors (thunderx and thunderxt99). Are you planing on moving cpu_addrcost_table in aarch64 to the same model as you are using arm? I would say you are losing information if you do that as cpu_addrcost_table in aarch64 describes more than what is described in your new table on the arm side. Thanks, Andrew > > Updated patch attached