https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:af792f0226e479b165a49de5e8f9e1d16a4b26c0

commit r15-2191-gaf792f0226e479b165a49de5e8f9e1d16a4b26c0
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Mon Jul 22 10:26:14 2024 +0100

    middle-end: Implement conditonal store vectorizer pattern [PR115531]

    This adds a conditional store optimization for the vectorizer as a pattern.
    The vectorizer already supports modifying memory accesses because of the
pattern
    based gather/scatter recognition.

    Doing it in the vectorizer allows us to still keep the ability to vectorize
such
    loops for architectures that don't have MASK_STORE support, whereas doing
this
    in ifcvt makes us commit to MASK_STORE.

    Concretely for this loop:

    void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int
stride)
    {
      if (stride <= 1)
        return;

      for (int i = 0; i < n; i++)
        {
          int res = c[i];
          int t = b[i+stride];
          if (a[i] != 0)
            res = t;
          c[i] = res;
        }
    }

    today we generate:

    .L3:
            ld1b    z29.s, p7/z, [x0, x5]
            ld1w    z31.s, p7/z, [x2, x5, lsl 2]
            ld1w    z30.s, p7/z, [x1, x5, lsl 2]
            cmpne   p15.b, p6/z, z29.b, #0
            sel     z30.s, p15, z30.s, z31.s
            st1w    z30.s, p7, [x2, x5, lsl 2]
            add     x5, x5, x4
            whilelo p7.s, w5, w3
            b.any   .L3

    which in gimple is:

      vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
      vect_t_20.12_74 = .MASK_LOAD (vectp.10_72, 32B, loop_mask_67);
      vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
      mask__34.16_79 = vect__9.15_77 != { 0, ... };
      vect_res_11.17_80 = VEC_COND_EXPR <mask__34.16_79, vect_t_20.12_74,
vect_res_18.9_68>;
      .MASK_STORE (vectp_c.18_81, 32B, loop_mask_67, vect_res_11.17_80);

    A MASK_STORE is already conditional, so there's no need to perform the load
of
    the old values and the VEC_COND_EXPR.  This patch makes it so we generate:

      vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
      vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
      mask__34.16_79 = vect__9.15_77 != { 0, ... };
      .MASK_STORE (vectp_c.18_81, 32B, mask__34.16_79, vect_res_18.9_68);

    which generates:

    .L3:
            ld1b    z30.s, p7/z, [x0, x5]
            ld1w    z31.s, p7/z, [x1, x5, lsl 2]
            cmpne   p7.b, p7/z, z30.b, #0
            st1w    z31.s, p7, [x2, x5, lsl 2]
            add     x5, x5, x4
            whilelo p7.s, w5, w3
            b.any   .L3

    gcc/ChangeLog:

            PR tree-optimization/115531
            * tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New.
            (vect_recog_cond_store_pattern): New.
            (vect_vect_recog_func_ptrs): Use it.
            * target.def (conditional_operation_is_expensive): New.
            * doc/tm.texi: Regenerate.
            * doc/tm.texi.in: Document it.
            * targhooks.cc (default_conditional_operation_is_expensive): New.
            * targhooks.h (default_conditional_operation_is_expensive): New.

Reply via email to