On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > Hi all, > > In this testcase the codegen for VLA SVE is worse than it could be due to > unrolling: > > fully_peel_me: > mov x1, 5 > ptrue p1.d, all > whilelo p0.d, xzr, x1 > ld1d z0.d, p0/z, [x0] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x0] > cntd x2 > addvl x3, x0, #1 > whilelo p0.d, x2, x1 > beq .L1 > ld1d z0.d, p0/z, [x0, #1, mul vl] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x3] > cntw x2 > incb x0, all, mul #2 > whilelo p0.d, x2, x1 > beq .L1 > ld1d z0.d, p0/z, [x0] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x0] > .L1: > ret > > In this case, due to the vector-length-agnostic nature of SVE the compiler > doesn't know the loop iteration count. > For such loops we don't want to unroll if we don't end up eliminating > branches as this just bloats code size > and hurts icache performance. > > This patch introduces a new unroll-known-loop-iterations-only param that > disables cunroll when the loop iteration > count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE > VLA code, but it does help some > Advanced SIMD cases as well where loops with an unknown iteration count are > not unrolled when it doesn't eliminate > the branches. > > So for the above testcase we generate now: > fully_peel_me: > mov x2, 5 > mov x3, x2 > mov x1, 0 > whilelo p0.d, xzr, x2 > ptrue p1.d, all > .L2: > ld1d z0.d, p0/z, [x0, x1, lsl 3] > fadd z0.d, z0.d, z0.d > st1d z0.d, p0, [x0, x1, lsl 3] > incd x1 > whilelo p0.d, x1, x3 > bne .L2 > ret > > Not perfect still, but it's preferable to the original code. > The new param is enabled by default on aarch64 but disabled for other > targets, leaving their behaviour unchanged > (until other target people experiment with it and set it, if appropriate). > > Bootstrapped and tested on aarch64-none-linux-gnu. > Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in > performance. > > Ok for trunk?
Hum. Why introduce a new --param and not simply key on flag_peel_loops instead? That is enabled by default at -O3 and with FDO but you of course can control that in your targets post-option-processing hook. It might also make sense to have more fine-grained control for this and allow a target to say whether it wants to peel a specific loop or not when the middle-end thinks that would be profitable. Richard. > Thanks, > Kyrill > > > 2018-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define. > * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to > disable unrolling on unknown iteration count. > * config/aarch64/aarch64.c (aarch64_override_options_internal): Set > PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1. > * doc/invoke.texi (--param unroll-known-loop-iterations-only): > Document. > > 2018-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/sve/unroll-1.c: New test. >