Sorry for the very late comment (was out last week)…

Richard Biener <rguent...@suse.de> writes:
> This enables SLP store group splitting also for loop vectorization.
> For the existing testcase gcc.dg/vect/vect-complex-5.c this then
> generates much better code, likewise for the PR97428 testcase.
>
> Both of those have a splitting opportunity splitting the group
> into two equal (vector-sized) halves, still the patch enables
> quite arbitrary splitting since generally the interleaving scheme
> results in quite awkward code for even small groups.  If any
> problems surface with this it's easy to restrict the splitting
> to known-good cases.  Is there any additional constraints for
> non-constant sized vectors?  Note this interacts with vector
> size iteration (but comparing interleaving cost with SLP cost
> of a smaller vector size doesn't reliably pick the smaller
> vector size).

Not sure about the variable-sized vector aspect.  For SVE it
isn't really natural to split the store itself up: I think we'd
instead want to keep a unified store and blend in the stored
values where necessary.  E.g. rather than split:

  a a a a b b c c

into:

  a a a a
  b b
  c c

we'd be better off having predicated groups of the form:

  a a a a _ _ _ _
  _ _ _ _ b b _ _
  _ _ _ _ _ _ c c

This is one thing on the very long todo list :-/

> @@ -2323,6 +2323,36 @@ vect_analyze_slp_instance (vec_info *vinfo,
>                                             rest, max_tree_size);
>         return res;
>       }
> +
> +      /* For loop vectorization split into arbitrary pieces of size > 1.  */
> +      if (is_a <loop_vec_info> (vinfo)
> +       && (i > 1 && i < group_size))
> +     {
> +       gcc_assert ((const_nunits & (const_nunits - 1)) == 0);

FWIW, we have pow2p_hwi for this.

> +       unsigned group1_size = i;
> +
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_NOTE, vect_location,
> +                          "Splitting SLP group at stmt %u\n", i);
> +
> +       stmt_vec_info rest = vect_split_slp_store_group (stmt_info,
> +                                                        group1_size);
> +       /* Loop vectorization cannot handle gaps in stores, make sure
> +          the split group appears as strided.  */
> +       STMT_VINFO_STRIDED_P (rest) = 1;
> +       DR_GROUP_GAP (rest) = 0;
> +       STMT_VINFO_STRIDED_P (stmt_info) = 1;
> +       DR_GROUP_GAP (stmt_info) = 0;

Does something undo the STMT_VINFO_STRIDED_P assignments if SLP
vectorisation fails?  If not, mightn't this pessimise things in
that case?  Realise that that won't be a concern once everything
is SLP, just wondering.

(Sorry if this has already been dealt with by later patches or
is otherwise just noise.)

Thanks,
Richard

> +
> +       bool res = vect_analyze_slp_instance (vinfo, bst_map, stmt_info,
> +                                             max_tree_size);
> +       if (i + 1 < group_size)
> +         res |= vect_analyze_slp_instance (vinfo, bst_map,
> +                                           rest, max_tree_size);
> +
> +       return res;
> +     }
> +
>        /* Even though the first vector did not all match, we might be able to 
> SLP
>        (some) of the remainder.  FORNOW ignore this possibility.  */
>      }

Reply via email to