Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

Richard Biener Fri, 09 Nov 2018 04:19:19 -0800

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
>
> Hi all,
>
> In this testcase the codegen for VLA SVE is worse than it could be due to 
> unrolling:
>
> fully_peel_me:
>          mov     x1, 5
>          ptrue   p1.d, all
>          whilelo p0.d, xzr, x1
>          ld1d    z0.d, p0/z, [x0]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0]
>          cntd    x2
>          addvl   x3, x0, #1
>          whilelo p0.d, x2, x1
>          beq     .L1
>          ld1d    z0.d, p0/z, [x0, #1, mul vl]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x3]
>          cntw    x2
>          incb    x0, all, mul #2
>          whilelo p0.d, x2, x1
>          beq     .L1
>          ld1d    z0.d, p0/z, [x0]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0]
> .L1:
>          ret
>
> In this case, due to the vector-length-agnostic nature of SVE the compiler 
> doesn't know the loop iteration count.
> For such loops we don't want to unroll if we don't end up eliminating 
> branches as this just bloats code size
> and hurts icache performance.
>
> This patch introduces a new unroll-known-loop-iterations-only param that 
> disables cunroll when the loop iteration
> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE 
> VLA code, but it does help some
> Advanced SIMD cases as well where loops with an unknown iteration count are 
> not unrolled when it doesn't eliminate
> the branches.
>
> So for the above testcase we generate now:
> fully_peel_me:
>          mov     x2, 5
>          mov     x3, x2
>          mov     x1, 0
>          whilelo p0.d, xzr, x2
>          ptrue   p1.d, all
> .L2:
>          ld1d    z0.d, p0/z, [x0, x1, lsl 3]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0, x1, lsl 3]
>          incd    x1
>          whilelo p0.d, x1, x3
>          bne     .L2
>          ret
>
> Not perfect still, but it's preferable to the original code.
> The new param is enabled by default on aarch64 but disabled for other 
> targets, leaving their behaviour unchanged
> (until other target people experiment with it and set it, if appropriate).
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
> performance.
>
> Ok for trunk?


Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.

Richard.

> Thanks,
> Kyrill
>
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>
>         * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define.
>         * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to
>         disable unrolling on unknown iteration count.
>         * config/aarch64/aarch64.c (aarch64_override_options_internal): Set
>         PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1.
>         * doc/invoke.texi (--param unroll-known-loop-iterations-only):
>         Document.
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>
>         * gcc.target/aarch64/sve/unroll-1.c: New test.
>

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

Reply via email to