On Thu, 29 Feb 2024, Richard Biener wrote:

> The following amends the PR114070 fix to optimistically allow
> the folding when we cannot expand the current vec_cond using
> vcond_mask and we're still before vector lowering.  This leaves
> a small window between vectorization and lowering where we could
> break vec_conds that can be expanded via vcond{,u,eq}, most
> susceptible is the loop unrolling pass which applies VN and thus
> possibly folding to the unrolled body of a vectorized loop.
> 
> This gets back the folding for targets that cannot do vectorization.
> It doesn't get back the folding for x86 with AVX512 for example
> since that can handle the original IL but not the folded since
> it misses some vcond_mask expanders.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> As said for stage1 I want to move vector lowering before vectorization.
> While I'm not entirely happy with this patch it forces us into the
> correct direction, getting vcond_mask and vcmp{,u,eq} patterns
> implemented.  We could use canonicalize_math_p () to close the
> vectorizer -> vector lowering gap but this only works when that
> pass is run (not with -Og or when disabled).  We could add a new
> PROP_vectorizer_il and disable the folding if the vectorizer ran.
> 
> Or we could simply live with the regression.
> 
> Any preferences?

I've tried moving vector lowering, first try to after the first
forwprop after IPA.  That exposes (at least) invariant motion
creating unsupported COND_EXPRs - we hoist a vector PHI as
_2 ? _3 : _6 and that might lead to unsupported BLKmode moves.

I think there's some latent issues to be fixed in passes.

A more conservative move is to duplicate vector lowering into
the loop/non-loop sections and put it right before vectorization
(but there's invariant motion after it, so the above issue will
prevail).  Since the vectorizer currently cannot handle existing
vector code "re-vectorization" is best done on lowered code (after
that got some cleanup).  Also SLP can interface with existing
vectors, but currently only non-BLKmode ones, so that would benefit
as well.

That said, the quick experiment shows this isn't anything for stage4.

Richard.

> Thanks,
> Richard.
> 
>       PR middle-end/114070
>       * match.pd ((c ? a : b) op d  -->  c ? (a op d) : (b op d)):
>       Allow the folding if before lowering and the current IL
>       isn't supported with vcond_mask.
> ---
>  gcc/match.pd | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f3fffd8dec2..4edba7c84fb 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5153,7 +5153,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
>    (if (TREE_CODE_CLASS (op) != tcc_comparison
>         || types_match (type, TREE_TYPE (@1))
> -       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> +       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> +       || (optimize_vectors_before_lowering_p ()
> +        /* The following is optimistic on the side of non-support, we are
> +           missing the legacy vcond{,u,eq} cases.  Do this only when
> +           lowering will be able to fixup..  */
> +        && !expand_vec_cond_expr_p (TREE_TYPE (@1),
> +                                    TREE_TYPE (@0), ERROR_MARK)))
>     (vec_cond @0 (op! @1 @3) (op! @2 @4))))
>  
>  /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
> @@ -5161,13 +5167,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (op (vec_cond:s @0 @1 @2) @3)
>    (if (TREE_CODE_CLASS (op) != tcc_comparison
>         || types_match (type, TREE_TYPE (@1))
> -       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> +       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> +       || (optimize_vectors_before_lowering_p ()
> +        && !expand_vec_cond_expr_p (TREE_TYPE (@1),
> +                                    TREE_TYPE (@0), ERROR_MARK)))
>     (vec_cond @0 (op! @1 @3) (op! @2 @3))))
>   (simplify
>    (op @3 (vec_cond:s @0 @1 @2))
>    (if (TREE_CODE_CLASS (op) != tcc_comparison
>         || types_match (type, TREE_TYPE (@1))
> -       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> +       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> +       || (optimize_vectors_before_lowering_p ()
> +        && !expand_vec_cond_expr_p (TREE_TYPE (@1),
> +                                    TREE_TYPE (@0), ERROR_MARK)))
>     (vec_cond @0 (op! @3 @1) (op! @3 @2)))))
>  
>  #if GIMPLE
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to