On Fri, 7 Aug 2020, Richard Biener wrote:

On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse <marc.gli...@inria.fr> wrote:

On Fri, 7 Aug 2020, Richard Biener wrote:

On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse <marc.gli...@inria.fr> wrote:

On Thu, 6 Aug 2020, Christophe Lyon wrote:

Was I on the right track configuring with
--target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
--with-fpu=neon-fp16
then compiling without any special option?

Maybe you also need --with-float=hard, I don't remember if it's
implied by the 'hf' target suffix

Thanks! That's what I was missing to reproduce the issue. Now I can
reproduce it with just

typedef unsigned int vec __attribute__((vector_size(16)));
typedef int vi __attribute__((vector_size(16)));
vi f(vec a,vec b){
     return a==5 | b==7;
}

with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 
-O1

   _1 = a_5(D) == { 5, 5, 5, 5 };
   _3 = b_6(D) == { 7, 7, 7, 7 };
   _9 = _1 | _3;
   _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);

we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
false), while with -fdisable-tree-forwprop4 we do manage to expand

   _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 
112);

It doesn't make much sense to me that we can expand the more complicated
form and not the simpler form of the same operation (both compare a to 5
and produce a vector of -1 or 0 of the same size), especially when the
target has an instruction (vceq) that does just what we want.

Introducing boolean vectors was fine, but I think they should be real
types, that we can operate on, not be forced to appear only as the first
argument of a vcond.

I can think of 2 natural ways to improve things: either implement vector
comparisons in the ARM backend (possibly by forwarding to their existing
code for vcond), or in the generic expansion code try using vcond if the
direct comparison opcode is not provided.

We can temporarily revert my patch, but I would like it to be temporary.
Since aarch64 seems to handle the same code just fine, maybe someone who
knows arm could copy the relevant code over?

Does my message make sense, do people have comments?

So what complicates things now (and to some extent pre-existed when you
used AVX512 which _could_ operate on boolean vectors) is that we
have split out the condition from VEC_COND_EXPR to separate stmts
but we do not expect backends to be able to code-generate the separate
form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs
to .VCOND[U] "merging" the compares again.  Now that process breaks
down once we have things like _9 = _1 | _3;  -  at some point I argued
that we should handle vector compares [and operations on boolean vectors]
as well in ISEL but then when it came up again for some reason I
disregarded that again.

Thus - we don't want to go back to fixing up the generic expansion code
(which looks at one instruction at a time and is restricted by TER single-use
restrictions).

Here, it would only be handling comparisons left over by ISEL because they
do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would
be more consistent, but then we may have to teach the vector lowering pass
about this.

Instead we want to deal with this in ISEL which should
behave more intelligently.  In the above case it might involve turning
the _1 and _3 defs into .VCOND [with different result type], doing
_9 in that type and then somehow dealing with _7 ... but this eventually
means undoing the match simplification that introduced the code?

For targets that do not have compact boolean vectors,
VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality.
Removing it to allow more simplifications makes sense, reintroducing it
for expansion can make sense as well, I think it is ok if the second one
reverses the first, but very locally, without having to propagate a change
of type to the uses. So on ARM we could turn _1 and _3 into .VCOND
producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or
does using bool vectors like that seem bad? (maybe they aren't guaranteed
to be equivalent to signed integer vectors with values 0 and -1 and we
need to query the target to know if that is the case, or introduce an
extra .VCOND)

Also, we only want to replace comparisons with .VCOND if the target
doesn't handle the comparison directly. For AVX512, we do want to produce
SImode bool vectors (for k* registers) and operate on them directly,
that's the whole point of introducing the bool vectors (if bool vectors
were only used to feed VEC_COND_EXPR and were all turned into .VCOND
before expansion, I don't see the point of specifying different types for
bool vectors for AVX512 vs non-AVX512, as it would make no difference on
what is passed to the backend).

I think targets should provide some number of operations, including for
instance bit_and and bit_ior on bool vectors, and be a bit consistent
about what they provide, it becomes unmanageable in the middle-end
otherwise...

It's currently somewhat of a mess I guess.  It might help to
restrict the simplification patterns to passes before vector lowering
so maybe add something like (or re-use)
canonicalize_math[_after_vectorization]_p ()?

That's indeed a nice way to gain time to sort out the mess :-)

This patch solved the issue on ARM for at least 1 testcase. I have a bootstrap+regtest running on x86_64-linux-gnu, at least the tests added in the previous patch still work.

2020-08-05  Marc Glisse  <marc.gli...@inria.fr>

        * generic-match-head.c (optimize_vectors_before_lowering_p): New
        function.
        * gimple-match-head.c (optimize_vectors_before_lowering_p):
        Likewise.
        * match.pd ((v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Use it.

--
Marc Glisse
diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
index 2454baac9d4..fdb528d9686 100644
--- a/gcc/generic-match-head.c
+++ b/gcc/generic-match-head.c
@@ -80,6 +80,16 @@ canonicalize_math_after_vectorization_p ()
   return false;
 }
 
+/* Return true if we can still perform transformations that may introduce
+   vector operations that are not supported by the target. Vector lowering
+   normally handles those, but after that pass, it becomes unsafe.  */
+
+static inline bool
+optimize_vectors_before_lowering_p ()
+{
+  return true;
+}
+
 /* Return true if successive divisions can be optimized.
    Defer to GIMPLE opts.  */
 
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 9b3e7298d87..4a65be703b9 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -1158,6 +1158,16 @@ canonicalize_math_after_vectorization_p ()
   return !cfun || (cfun->curr_properties & PROP_gimple_lvec) != 0;
 }
 
+/* Return true if we can still perform transformations that may introduce
+   vector operations that are not supported by the target. Vector lowering
+   normally handles those, but after that pass, it becomes unsafe.  */
+
+static inline bool
+optimize_vectors_before_lowering_p ()
+{
+  return !cfun || (cfun->curr_properties & PROP_gimple_lvec) == 0;
+}
+
 /* Return true if pow(cst, x) should be optimized into exp(log(cst) * x).
    As a workaround for SPEC CPU2017 628.pop2_s, don't do it if arg0
    is an exact integer, arg1 = phi_res +/- cst1 and phi_res = PHI <cst2, ...>
diff --git a/gcc/match.pd b/gcc/match.pd
index d8e3927d3c7..7e5c5a6eae6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3461,40 +3461,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (vec_cond @0 (op! @3 @1) (op! @3 @2))))
 #endif
 
-/* (v ? w : 0) ? a : b is just (v & w) ? a : b  */
+/* (v ? w : 0) ? a : b is just (v & w) ? a : b
+   Currently disabled after pass lvec because ARM understands
+   VEC_COND_EXPR<v==w,-1,0> but not a plain v==w fed to BIT_IOR_EXPR.  */
 (simplify
  (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
- (if (types_match (@0, @3))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
   (vec_cond (bit_and @0 @3) @1 @2)))
 (simplify
  (vec_cond (vec_cond:s @0 integer_all_onesp @3) @1 @2)
- (if (types_match (@0, @3))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
   (vec_cond (bit_ior @0 @3) @1 @2)))
 (simplify
  (vec_cond (vec_cond:s @0 integer_zerop @3) @1 @2)
- (if (types_match (@0, @3))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
   (vec_cond (bit_ior @0 (bit_not @3)) @2 @1)))
 (simplify
  (vec_cond (vec_cond:s @0 @3 integer_all_onesp) @1 @2)
- (if (types_match (@0, @3))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
   (vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
 
 /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
 (simplify
  (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
- (if (types_match (@0, @1))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
   (vec_cond (bit_and @0 @1) @2 @3)))
 (simplify
  (vec_cond @0 @2 (vec_cond:s @1 @2 @3))
- (if (types_match (@0, @1))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
   (vec_cond (bit_ior @0 @1) @2 @3)))
 (simplify
  (vec_cond @0 (vec_cond:s @1 @2 @3) @2)
- (if (types_match (@0, @1))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
   (vec_cond (bit_ior (bit_not @0) @1) @2 @3)))
 (simplify
  (vec_cond @0 @3 (vec_cond:s @1 @2 @3))
- (if (types_match (@0, @1))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
   (vec_cond (bit_and (bit_not @0) @1) @2 @3)))
 
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also

Reply via email to