x86_64

Richard Biener Wed, 28 Feb 2024 02:06:01 -0800

On Tue, 27 Feb 2024, Richard Biener wrote:

> On Tue, 27 Feb 2024, Jeff Law wrote:
> 
> > 
> > 
> > On 2/27/24 06:53, Richard Biener wrote:
> > > On Tue, 27 Feb 2024, Jeff Law wrote:
> > > 
> > >>
> > >>
> > >> On 2/27/24 00:43, Richard Biener wrote:
> > >>> On Tue, 27 Feb 2024, haochen.jiang wrote:
> > >>>
> > >>>> On Linux/x86_64,
> > >>>>
> > >>>> af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit
> > >>>> commit af66ad89e8169f44db723813662917cf4cbb78fc
> > >>>> Author: Richard Biener <rguent...@suse.de>
> > >>>> Date:   Fri Feb 23 16:06:05 2024 +0100
> > >>>>
> > >>>>       middle-end/114070 - folding breaking VEC_COND expansion
> > >>>>
> > >>>> caused
> > >>>>
> > >>>> FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr"
> > >>>
> > >>> This shows that the x86 backend is missing vcond_mask_qiqi and friends
> > >>> (for AVX512 mask modes).  Either that or both expand_vec_cond_expr_p
> > >>> and all the machinery behind it (ISEL pass, lowering) should handle
> > >>> pure integer mode VEC_COND_EXPR via bit operations.  I think quite some
> > >>> targets now implement patterns for these variants, whatever their
> > >>> boolean vector modes are.
> > >> There may be more going on than just that.  The andnot-2 test started
> > >> regressing on most targets overnight, including on targets without vector
> > >> capabilities.
> > > 
> > > Yes, we fail this generic vector simplification test now (not sure
> > > why the test didn't test forwprop1).  As said the problem is that
> > > we can't test whether the existing VEC_COND_EXPR is handled
> > > (we just can test if it's handled by vcond_mask).
> > ACK.  I'll force those targets to regenerate their baselines.
> > 
> > I hadn't read things too closely and just wanted to raise the issue that 
> > this
> > impacts additional targets w/o vector support.  It sounds like it's largely
> > expected fallout.
> 
> No, I didn't expect it (well, kind-of but my own testing came up 
> clean...).  I'm going to try the extra patterns.


So the following fixes it for AVR for example but it doesn't fix it
with -march=znver4 since we can expand the original IL but not the
one we attempt to fold to.  The testcase at hand would be also
fixed if we force simplification of the outer vec_cond.  But we
currently can't use '!' at the outermost operation, or at least
it wouldn't do the expected thing.

The testcase which is

typedef long vec __attribute__((vector_size(16)));
vec f(vec x){
  vec y = x < 10;
  return y & (y == 0);
}

really expects y & (y == 0) to simplify, the x < 10 is not really
important besides making 'y' a boolean.

typedef long vec __attribute__((vector_size(16)));
vec f(vec y){
  return (y != 0) & (y == 0);
}

is the gist of what we're simplifying and we're doing this by
a quite complicated dance.

The patch below comes at the cost of a bit of pattern explosion and
it still shows that some targets need to be adjusted to get the
constant folding that's desired here.  Unfortunately there's no
good way to express logical_inverted_value to make

/* X & !X -> 0.  */
(simplify 
 (bit_and:c @0 (logical_inverted_value @0))
 { build_zero_cst (type); })

catch it.  The (match ..) syntax isn't powerful enough to
return an alternate tree to match like maybe a syntax like
the following could express:

(match (logical_inverted_value (vec_cond@@0 (ne @1 @2) @3 @4))
 (vec_cond (eq @1 @2) @3 @4))

which I think can be only made "working" when actually "inlining"
the (match ..) stuff into the (simplify ..) uses, which was my
original plan but which of course would act as another source of pattern
explosion.

Untested fix for targets that cannot handle the original IL below.
I'm not convinced that's the way to go here, is it?  Or scrap
the testcase?  Or have a cheap way to say "this target doesn't
support _any_ vec_cond"?

Another way, but for stage1 I think, would be to delay all the
vector checking to the "final" maybe_push_res_to_seq, but that
requires aggressively cleaning out intermediate folding results
no longer needed and also still requires patterns to do the
"is the incoming IL supported on the target" as later we do not
know what parts we matched (this could in theory be automated
as well, of course).  There's also no way to say "don't apply
this unless the final simplification result is a constant".
While we might be able to add a new '*' modifier saying
"if the result is supported by the target" there's again no
way of conditionalizing this on the original operation being
supported.  OK, maybe on the match allow '*1' and when '*1'
is used in the result we fail if any of the '*1' annotated
parts in the match were supported but at least one of the '*1'
in the result is not.  Supported only for select operations.
But it still doesn't help as we'd have to somehow check the
re-simplified result and we can't really track where a
vec_cond we put in goes.

Thanks,
Richard.

diff --git a/gcc/match.pd b/gcc/match.pd
index f3fffd8dec2..4c4b9484a7a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5168,7 +5168,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (TREE_CODE_CLASS (op) != tcc_comparison
        || types_match (type, TREE_TYPE (@1))
        || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
-   (vec_cond @0 (op! @3 @1) (op! @3 @2)))))
+   (vec_cond @0 (op! @3 @1) (op! @3 @2))))
+
+ (for op2 (tcc_comparison)
+  (simplify
+   (op (vec_cond:s (op2@0 @00 @01) @1 @2) (vec_cond:s @0 @3 @4))
+   (if (TREE_CODE_CLASS (op) != tcc_comparison
+        || types_match (type, TREE_TYPE (@1))
+        || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
+       || !expand_vec_cond_expr_p (TREE_TYPE (@1), TREE_TYPE (@00), op2))
+    (vec_cond @0 (op! @1 @3) (op! @2 @4))))
+  (simplify
+   (op (vec_cond:s (op2@0 @00 @01) @1 @2) @3)
+   (if (TREE_CODE_CLASS (op) != tcc_comparison
+        || types_match (type, TREE_TYPE (@1))
+        || expand_vec_cond_expr_p (type, TREE_TYPE (@00), op2)
+       || !expand_vec_cond_expr_p (TREE_TYPE (@1), TREE_TYPE (@00), op2))
+    (vec_cond @0 (op! @1 @3) (op! @2 @3))))
+  (simplify
+   (op @3 (vec_cond:s (op2@0 @00 @01) @1 @2))
+   (if (TREE_CODE_CLASS (op) != tcc_comparison
+        || types_match (type, TREE_TYPE (@1))
+        || expand_vec_cond_expr_p (type, TREE_TYPE (@00), op2)
+       || !expand_vec_cond_expr_p (TREE_TYPE (@1), TREE_TYPE (@00), op2))
+    (vec_cond @0 (op! @3 @1) (op! @3 @2))))))
 
 #if GIMPLE
 (match (nop_atomic_bit_test_and_p @0 @1 @4)

Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

Reply via email to