Ping. Original thread: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01314.html
(I will fix the typos which Bernhard found before submitting) On 21 February 2017 at 16:54, <charles.bay...@linaro.org> wrote: > From: Charles Baylis <charles.bay...@linaro.org> > > Hi Ramana, > > This patch set continues previous work on fixing the cost calculations for > MEMs > which use different addressing modes. It implements the approach we discussed > at Linaro Connect BKK16. > > I have included some notes on the patch set as follows: > > > Background: > > The motivating problem is that this function: > char *f(char *p, int8x8x4_t v, int r) { vst4_s8(p, v); p+=32; return p; } > compiles to: > mov r3, r0 > adds r0, r0, #32 > vst4.8 {d0-d3}, [r3] > bx lr > but we would like to get: > vst4.8 {d0-d3}, [r0]! > bx lr > > Although the ARM back end contains patterns for the write-back forms of these > instructions, they are not currently generated. The reason for this is that > the > auto-inc-dec phase does not perform this optimisation because arm_rtx_costs > incorrectly calculates the cost of "vst4.8 {d0-d3}, [r0]!" as much higher > than > "vst4.8 {d0-d3}, [r3]". For that reason, it considers the POST_INC form to be > worse than the initial sequence of vst4/add and does not perform the > transformation. > > In fact, GCC6 has regressions compared to GCC5 in this area, and no longer > does post-indexed addressing for int64_t or 64 bit vector types. > > > Solution: > > Change cost calculation for MEMs so that the cost of the memory access > is computed separately from the cost of the addressing mode. A new > table-driven mechanism is introduced for the costs of the addressing modes. > > The first patch in the series implements the calculation of the cost of > the memory access. > > The second patch adds the table-driven model of the extra cost of the > selected addressing mode. I don't have access to a lot of CPU pipeline > information, so most CPUs use the generic cost table, with the exception of > Cortex-A57. > > > Testing: > > I did "make check" on arm-linux-gnueabihf with qemu. This patch fixes one test > failure in lp1243022.c. > > > Benchmarking: > > On Cortex-A15, SPEC2006 and a popular suite of embedded benchmarks perform the > same as before this patch is applied. This is expected, the expected gain is > in code quality for hand-written NEON intrinsics code. > > > > Charles Baylis (2): > [ARM] Refactor costs calculation for MEM. > [ARM] Add table of costs for AAarch32 addressing modes. > > gcc/config/arm/aarch-common-protos.h | 16 +++++ > gcc/config/arm/aarch-cost-tables.h | 54 ++++++++++++++-- > gcc/config/arm/arm.c | 120 > ++++++++++++++++++++++++++--------- > 3 files changed, 154 insertions(+), 36 deletions(-) > > -- > 2.7.4 >