> Am 06.07.2023 um 19:50 schrieb Richard Sandiford <richard.sandif...@arm.com>: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >>> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches >>> <gcc-patches@gcc.gnu.org> wrote: >>> >>> Hi, >>> >>> If a loop is unrolled by n times during vectoriation, two steps are used to >>> calculate the induction variable: >>> - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step) >>> - The large step for the whole loop: vec_loop = vec_iv + (VF * Step) >>> >>> This patch calculates an extra vec_n to replace vec_loop: >>> vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop. >>> >>> So that we can save the large step register and related operations. >> >> OK. It would be nice to avoid the dead stmts created earlier though. > > FWIW, I still don't think we should do this. Part of the point of > unrolling is to shorten loop-carried dependencies, whereas this patch > is going in the opposite direction. Note ncopies can be >1 without additional unrolling. With non VLA vectors all of the updates will be constant folded btw. Richard > Richard > >> >> Thanks, >> Richard. >> >>> gcc/ChangeLog: >>> >>> PR tree-optimization/110449 >>> * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace >>> vec_loop for the unrolled loop. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/pr110449.c: New testcase. >>> --- >>> gcc/testsuite/gcc.target/aarch64/pr110449.c | 40 +++++++++++++++++++++ >>> gcc/tree-vect-loop.cc | 21 +++++++++-- >>> 2 files changed, 58 insertions(+), 3 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110449.c >>> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110449.c >>> b/gcc/testsuite/gcc.target/aarch64/pr110449.c >>> new file mode 100644 >>> index 00000000000..bb3b6dcfe08 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/pr110449.c >>> @@ -0,0 +1,40 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-Ofast -mcpu=neoverse-n2 --param >>> aarch64-vect-unroll-limit=2" } */ >>> +/* { dg-final { scan-assembler-not "8.0e\\+0" } } */ >>> + >>> +/* Calcualte the vectorized induction with smaller step for an unrolled >>> loop. >>> + >>> + before (suggested_unroll_factor=2): >>> + fmov s30, 8.0e+0 >>> + fmov s31, 4.0e+0 >>> + dup v27.4s, v30.s[0] >>> + dup v28.4s, v31.s[0] >>> + .L6: >>> + mov v30.16b, v31.16b >>> + fadd v31.4s, v31.4s, v27.4s >>> + fadd v29.4s, v30.4s, v28.4s >>> + stp q30, q29, [x0] >>> + add x0, x0, 32 >>> + cmp x1, x0 >>> + bne .L6 >>> + >>> + after: >>> + fmov s31, 4.0e+0 >>> + dup v29.4s, v31.s[0] >>> + .L6: >>> + fadd v30.4s, v31.4s, v29.4s >>> + stp q31, q30, [x0] >>> + add x0, x0, 32 >>> + fadd v31.4s, v29.4s, v30.4s >>> + cmp x0, x1 >>> + bne .L6 */ >>> + >>> +void >>> +foo2 (float *arr, float freq, float step) >>> +{ >>> + for (int i = 0; i < 1024; i++) >>> + { >>> + arr[i] = freq; >>> + freq += step; >>> + } >>> +} >>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc >>> index 3b46c58a8d8..706ecbffd0c 100644 >>> --- a/gcc/tree-vect-loop.cc >>> +++ b/gcc/tree-vect-loop.cc >>> @@ -10114,7 +10114,7 @@ vectorizable_induction (loop_vec_info loop_vinfo, >>> new_vec, step_vectype, NULL); >>> >>> vec_def = induc_def; >>> - for (i = 1; i < ncopies; i++) >>> + for (i = 1; i < ncopies + 1; i++) >>> { >>> /* vec_i = vec_prev + vec_step */ >>> gimple_seq stmts = NULL; >>> @@ -10124,8 +10124,23 @@ vectorizable_induction (loop_vec_info loop_vinfo, >>> vec_def = gimple_convert (&stmts, vectype, vec_def); >>> >>> gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT); >>> - new_stmt = SSA_NAME_DEF_STMT (vec_def); >>> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); >>> + if (i < ncopies) >>> + { >>> + new_stmt = SSA_NAME_DEF_STMT (vec_def); >>> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); >>> + } >>> + else >>> + { >>> + /* vec_1 = vec_iv + (VF/n * S) >>> + vec_2 = vec_1 + (VF/n * S) >>> + ... >>> + vec_n = vec_prev + (VF/n * S) = vec_iv + VF * S = vec_loop >>> + >>> + vec_n is used as vec_loop to save the large step register >>> and >>> + related operations. */ >>> + add_phi_arg (induction_phi, vec_def, loop_latch_edge >>> (iv_loop), >>> + UNKNOWN_LOCATION); >>> + } >>> } >>> } >>> >>> -- >>> 2.34.1
Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)
Richard Biener via Gcc-patches Thu, 06 Jul 2023 22:32:46 -0700
- [PATCH] Vect: use a small step to calcul... Hao Liu OS via Gcc-patches
- Re: [PATCH] Vect: use a small step ... Richard Biener via Gcc-patches
- Re: [PATCH] Vect: use a small s... Jeff Law via Gcc-patches
- Re: [PATCH] Vect: use a sma... Hao Liu OS via Gcc-patches
- Re: [PATCH] Vect: use a small s... Richard Sandiford via Gcc-patches
- Re: [PATCH] Vect: use a sma... Richard Biener via Gcc-patches
- Re: [PATCH] Vect: use a... Richard Sandiford via Gcc-patches
- Re: [PATCH] Vect: ... Richard Biener via Gcc-patches