Hi, This patch eliminates redundant reverse permutations in vectorized reverse loops by detecting and optimizing patterns during store vectorization.
The reverse load (b[i]) generates PERM, operations are applied, then the
reverse store adds another PERM. This creates redundant permute pairs that
we now detect and eliminate.
With the patch, for the example loop
for (int i = N - 1; i >= 0; i--)
{
a[i] = b[i] + 1.0f;
}
Changes to the following
- ldr q29, [x0, x2]
- tbl v29.16b, {v29.16b}, v31.16b
- fadd v29.4s, v29.4s, v30.4s
- tbl v29.16b, {v29.16b}, v31.16b
- str q29, [x3, x2]
+ ldr q30, [x0, x2]
+ fadd v30.4s, v30.4s, v31.4s
+ str q30, [x3, x2]
PR tree-optimization/61338
gcc/ChangeLog:
(get_vector_perm_operand): New.
(vect_find_reverse_permute_operand): New helper function
to find reverse permutations through element-wise operation chains.
Returns true only if ALL operands have reverse permutations.
(vectorizable_store): Use recursive helper to eliminate redundant
reverse permutations with configurable search depth.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-permute-reverse-1.c: New test for basic
reverse permute optimization (simple copy).
* gcc.dg/vect/slp-permute-reverse-2.c: New runtime test for
basic pattern.
Signed-off-by: Kugan Vivekanandarajah <[email protected]>
Bootstrapped and regression tested on aarch64-linux-gcc. Is this OK?
Thanks,
Kugan
0001-PATCH-tree-optimization-61338-Optimize-redundant-rev.patch
Description: 0001-PATCH-tree-optimization-61338-Optimize-redundant-rev.patch
