On Tue, 27 Feb 2024, Richard Biener wrote: > On Tue, 27 Feb 2024, Jeff Law wrote: > > > > > > > On 2/27/24 06:53, Richard Biener wrote: > > > On Tue, 27 Feb 2024, Jeff Law wrote: > > > > > >> > > >> > > >> On 2/27/24 00:43, Richard Biener wrote: > > >>> On Tue, 27 Feb 2024, haochen.jiang wrote: > > >>> > > >>>> On Linux/x86_64, > > >>>> > > >>>> af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit > > >>>> commit af66ad89e8169f44db723813662917cf4cbb78fc > > >>>> Author: Richard Biener <rguent...@suse.de> > > >>>> Date: Fri Feb 23 16:06:05 2024 +0100 > > >>>> > > >>>> middle-end/114070 - folding breaking VEC_COND expansion > > >>>> > > >>>> caused > > >>>> > > >>>> FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" > > >>> > > >>> This shows that the x86 backend is missing vcond_mask_qiqi and friends > > >>> (for AVX512 mask modes). Either that or both expand_vec_cond_expr_p > > >>> and all the machinery behind it (ISEL pass, lowering) should handle > > >>> pure integer mode VEC_COND_EXPR via bit operations. I think quite some > > >>> targets now implement patterns for these variants, whatever their > > >>> boolean vector modes are. > > >> There may be more going on than just that. The andnot-2 test started > > >> regressing on most targets overnight, including on targets without vector > > >> capabilities. > > > > > > Yes, we fail this generic vector simplification test now (not sure > > > why the test didn't test forwprop1). As said the problem is that > > > we can't test whether the existing VEC_COND_EXPR is handled > > > (we just can test if it's handled by vcond_mask). > > ACK. I'll force those targets to regenerate their baselines. > > > > I hadn't read things too closely and just wanted to raise the issue that > > this > > impacts additional targets w/o vector support. It sounds like it's largely > > expected fallout. > > No, I didn't expect it (well, kind-of but my own testing came up > clean...). I'm going to try the extra patterns.
So the following fixes it for AVR for example but it doesn't fix it with -march=znver4 since we can expand the original IL but not the one we attempt to fold to. The testcase at hand would be also fixed if we force simplification of the outer vec_cond. But we currently can't use '!' at the outermost operation, or at least it wouldn't do the expected thing. The testcase which is typedef long vec __attribute__((vector_size(16))); vec f(vec x){ vec y = x < 10; return y & (y == 0); } really expects y & (y == 0) to simplify, the x < 10 is not really important besides making 'y' a boolean. typedef long vec __attribute__((vector_size(16))); vec f(vec y){ return (y != 0) & (y == 0); } is the gist of what we're simplifying and we're doing this by a quite complicated dance. The patch below comes at the cost of a bit of pattern explosion and it still shows that some targets need to be adjusted to get the constant folding that's desired here. Unfortunately there's no good way to express logical_inverted_value to make /* X & !X -> 0. */ (simplify (bit_and:c @0 (logical_inverted_value @0)) { build_zero_cst (type); }) catch it. The (match ..) syntax isn't powerful enough to return an alternate tree to match like maybe a syntax like the following could express: (match (logical_inverted_value (vec_cond@@0 (ne @1 @2) @3 @4)) (vec_cond (eq @1 @2) @3 @4)) which I think can be only made "working" when actually "inlining" the (match ..) stuff into the (simplify ..) uses, which was my original plan but which of course would act as another source of pattern explosion. Untested fix for targets that cannot handle the original IL below. I'm not convinced that's the way to go here, is it? Or scrap the testcase? Or have a cheap way to say "this target doesn't support _any_ vec_cond"? Another way, but for stage1 I think, would be to delay all the vector checking to the "final" maybe_push_res_to_seq, but that requires aggressively cleaning out intermediate folding results no longer needed and also still requires patterns to do the "is the incoming IL supported on the target" as later we do not know what parts we matched (this could in theory be automated as well, of course). There's also no way to say "don't apply this unless the final simplification result is a constant". While we might be able to add a new '*' modifier saying "if the result is supported by the target" there's again no way of conditionalizing this on the original operation being supported. OK, maybe on the match allow '*1' and when '*1' is used in the result we fail if any of the '*1' annotated parts in the match were supported but at least one of the '*1' in the result is not. Supported only for select operations. But it still doesn't help as we'd have to somehow check the re-simplified result and we can't really track where a vec_cond we put in goes. Thanks, Richard. diff --git a/gcc/match.pd b/gcc/match.pd index f3fffd8dec2..4c4b9484a7a 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5168,7 +5168,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (TREE_CODE_CLASS (op) != tcc_comparison || types_match (type, TREE_TYPE (@1)) || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)) - (vec_cond @0 (op! @3 @1) (op! @3 @2))))) + (vec_cond @0 (op! @3 @1) (op! @3 @2)))) + + (for op2 (tcc_comparison) + (simplify + (op (vec_cond:s (op2@0 @00 @01) @1 @2) (vec_cond:s @0 @3 @4)) + (if (TREE_CODE_CLASS (op) != tcc_comparison + || types_match (type, TREE_TYPE (@1)) + || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK) + || !expand_vec_cond_expr_p (TREE_TYPE (@1), TREE_TYPE (@00), op2)) + (vec_cond @0 (op! @1 @3) (op! @2 @4)))) + (simplify + (op (vec_cond:s (op2@0 @00 @01) @1 @2) @3) + (if (TREE_CODE_CLASS (op) != tcc_comparison + || types_match (type, TREE_TYPE (@1)) + || expand_vec_cond_expr_p (type, TREE_TYPE (@00), op2) + || !expand_vec_cond_expr_p (TREE_TYPE (@1), TREE_TYPE (@00), op2)) + (vec_cond @0 (op! @1 @3) (op! @2 @3)))) + (simplify + (op @3 (vec_cond:s (op2@0 @00 @01) @1 @2)) + (if (TREE_CODE_CLASS (op) != tcc_comparison + || types_match (type, TREE_TYPE (@1)) + || expand_vec_cond_expr_p (type, TREE_TYPE (@00), op2) + || !expand_vec_cond_expr_p (TREE_TYPE (@1), TREE_TYPE (@00), op2)) + (vec_cond @0 (op! @3 @1) (op! @3 @2)))))) #if GIMPLE (match (nop_atomic_bit_test_and_p @0 @1 @4)