Hi,

This patch eliminates redundant reverse permutations in vectorized reverse
loops by detecting and optimizing patterns during store vectorization.

The reverse load (b[i]) generates PERM, operations are applied, then the
reverse store adds another PERM. This creates redundant permute pairs that
we now detect and eliminate.

With the patch, for the example loop
  for (int i = N - 1; i >= 0; i--)
    {
      a[i] = b[i] + 1.0f;
    }
Changes to the following
-       ldr     q29, [x0, x2]
-       tbl     v29.16b, {v29.16b}, v31.16b
-       fadd    v29.4s, v29.4s, v30.4s
-       tbl     v29.16b, {v29.16b}, v31.16b
-       str     q29, [x3, x2]
+       ldr     q30, [x0, x2]
+       fadd    v30.4s, v30.4s, v31.4s
+       str     q30, [x3, x2]

        PR tree-optimization/61338

gcc/ChangeLog:
        (get_vector_perm_operand): New.
        (vect_find_reverse_permute_operand): New  helper function
        to find reverse permutations through element-wise operation chains.
        Returns true only if ALL operands have reverse permutations.
        (vectorizable_store): Use recursive helper to eliminate redundant
        reverse permutations with configurable search depth.

gcc/testsuite/ChangeLog:

        * gcc.dg/vect/slp-permute-reverse-1.c: New test for basic
        reverse permute optimization (simple copy).
        * gcc.dg/vect/slp-permute-reverse-2.c: New runtime test for
        basic pattern.
Signed-off-by: Kugan Vivekanandarajah <[email protected]>

Bootstrapped and regression tested on aarch64-linux-gcc. Is this OK?

Thanks,
Kugan


Attachment: 0001-PATCH-tree-optimization-61338-Optimize-redundant-rev.patch
Description: 0001-PATCH-tree-optimization-61338-Optimize-redundant-rev.patch

Reply via email to