Ping.

Original thread: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01314.html

(I will fix the typos which Bernhard found before submitting)

On 21 February 2017 at 16:54,  <charles.bay...@linaro.org> wrote:
> From: Charles Baylis <charles.bay...@linaro.org>
>
> Hi Ramana,
>
> This patch set continues previous work on fixing the cost calculations for 
> MEMs
> which use different addressing modes. It implements the approach we discussed
> at Linaro Connect BKK16.
>
> I have included some notes on the patch set as follows:
>
>
> Background:
>
> The motivating problem is that this function:
>   char *f(char *p, int8x8x4_t v, int r) { vst4_s8(p, v); p+=32; return p; }
> compiles to:
>         mov     r3, r0
>         adds    r0, r0, #32
>         vst4.8  {d0-d3}, [r3]
>         bx      lr
> but we would like to get:
>         vst4.8  {d0-d3}, [r0]!
>         bx      lr
>
> Although the ARM back end contains patterns for the write-back forms of these
> instructions, they are not currently generated. The reason for this is that 
> the
> auto-inc-dec phase does not perform this optimisation because arm_rtx_costs
> incorrectly calculates the cost of "vst4.8  {d0-d3}, [r0]!" as much higher 
> than
> "vst4.8  {d0-d3}, [r3]". For that reason, it considers the POST_INC form to be
> worse than the initial sequence of vst4/add and does not perform the
> transformation.
>
> In fact, GCC6 has regressions compared to GCC5 in this area, and no longer
> does post-indexed addressing for int64_t or 64 bit vector types.
>
>
> Solution:
>
> Change cost calculation for MEMs so that the cost of the memory access
> is computed separately from the cost of the addressing mode. A new
> table-driven mechanism is introduced for the costs of the addressing modes.
>
> The first patch in the series implements the calculation of the cost of
> the memory access.
>
> The second patch adds the table-driven model of the extra cost of the
> selected addressing mode. I don't have access to a lot of CPU pipeline
> information, so most CPUs use the generic cost table, with the exception of
> Cortex-A57.
>
>
> Testing:
>
> I did "make check" on arm-linux-gnueabihf with qemu. This patch fixes one test
> failure in lp1243022.c.
>
>
> Benchmarking:
>
> On Cortex-A15, SPEC2006 and a popular suite of embedded benchmarks perform the
> same as before this patch is applied.  This is expected, the expected gain is
> in code quality for hand-written NEON intrinsics code.
>
>
>
> Charles Baylis (2):
>   [ARM] Refactor costs calculation for MEM.
>   [ARM] Add table of costs for AAarch32 addressing modes.
>
>  gcc/config/arm/aarch-common-protos.h |  16 +++++
>  gcc/config/arm/aarch-cost-tables.h   |  54 ++++++++++++++--
>  gcc/config/arm/arm.c                 | 120 
> ++++++++++++++++++++++++++---------
>  3 files changed, 154 insertions(+), 36 deletions(-)
>
> --
> 2.7.4
>

Reply via email to