Richard Biener <rguent...@suse.de> writes: > On December 12, 2019 12:56:01 AM GMT+01:00, Jakub Jelinek <ja...@redhat.com> > wrote: >>Hi! >> >>The AVX512{F,VL} vector comparisons that set %kN registers are >>represented >>in RTL as comparisons with vector mode operands and scalar integral >>result, >>where at runtime the scalar integer is filled with a bitmask. >>Unfortunately, simplify_relational_operation would fold e.g. >>(eq:SI (reg:V32HI x) (reg:V32HI x)) >>into (const_int 1) rather than (const_int -1) that is expected (all >>elements >>equal). simplify_const_relational_operation is documented to always >>return >>just const0_rtx or const_true_rtx and simplify_relational_operation is >>expected to fix this up, for vector comparisons with vector result it >>duplicates the 0 or -1 into all elements, etc., so this patch adds >>handling >>for this case there too. >> >>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > So there's no whole vector equality RTX but we have to pun to integer modes > for that? The eq:SImode would suggest that. Guess we should have used a > BImode vector representation... > > Can you check whether we have any target with whole vector compare patterns > that would break here?
We also have vector cbranch<mode>4 for vectors, which does a genuine scalar comparison of vectors. It happens that the representation of them on x86 (using CCmodes only) means that there's currently no ambiguity with the elementwise vector-to-scalar comparisons above. But I think further encoding this assumption is going to cause problems eventually. E.g. the scalar comparison { 0, 0 } == { 0, 1 } should be false when used with cbranch<mode>4, but should be 2 when used with an elementwise interpretation. If the cbranch<mode>4 interpretation is turned into a cstore for (cbranch<mode>4 ? 1 : 0), we could easily end up wanting (eq:SI ...) of those vectors to return false rather than 2. Even for { 0, 0 } == { 0, 0 }, a cstore (eq:SI ...) would need to be 1 rather than 3. I wonder how easy it would be to make the mask registers use MODE_VECTOR_BOOL instead of scalar integers... :-) For now I think the safest thing would be to punt, rather than assume one thing or the other. simplify_const_relational_operation doesn't handle many vector cases anyway, and most interesting optimisations like this should happen on gimple, where we know whether the condition result is a vector or a scalar. Thanks, Richard > > Richard. > >>2019-12-11 Jakub Jelinek <ja...@redhat.com> >> >> PR target/92908 >> * simplify-rtx.c (simplify_relational_operation): For vector cmp_mode >> and scalar mode, if simplify_relational_operation returned >> const_true_rtx, return a scalar bitmask of all ones. >> (simplify_const_relational_operation): Change VOID_mode in function >> comment to VOIDmode. >> >> * gcc.target/i386/avx512bw-pr92908.c: New test. >> >>--- gcc/simplify-rtx.c.jj 2019-11-19 22:27:02.000058742 +0100 >>+++ gcc/simplify-rtx.c 2019-12-11 13:31:57.197809704 +0100 >>@@ -5037,6 +5037,23 @@ simplify_relational_operation (enum rtx_ >> return NULL_RTX; >> #endif >> } >>+ if (VECTOR_MODE_P (cmp_mode) >>+ && SCALAR_INT_MODE_P (mode) >>+ && tem == const_true_rtx) >>+ { >>+ /* Vector comparisons that expect a scalar integral >>+ bitmask. For const0_rtx the result is already correct, >>+ for const_true_rtx we need all bits set. */ >>+ int n_elts; >>+ scalar_int_mode smode = as_a <scalar_int_mode> (mode); >>+ gcc_assert (GET_MODE_NUNITS (cmp_mode).is_constant (&n_elts) >>+ && GET_MODE_PRECISION (smode) <= n_elts); >>+ if (GET_MODE_PRECISION (smode) == n_elts) >>+ return constm1_rtx; >>+ if (n_elts < HOST_BITS_PER_WIDE_INT) >>+ return GEN_INT ((HOST_WIDE_INT_1U << n_elts) - 1); >>+ return NULL_RTX; >>+ } >> >> return tem; >> } >>@@ -5383,7 +5400,7 @@ comparison_result (enum rtx_code code, i >> } >> >> /* Check if the given comparison (done in the given MODE) is actually >>- a tautology or a contradiction. If the mode is VOID_mode, the >>+ a tautology or a contradiction. If the mode is VOIDmode, the >> comparison is done in "infinite precision". If no simplification >> is possible, this function returns zero. Otherwise, it returns >> either const_true_rtx or const0_rtx. */ >>--- gcc/testsuite/gcc.target/i386/avx512bw-pr92908.c.jj 2019-12-11 >>14:24:12.083418762 +0100 >>+++ gcc/testsuite/gcc.target/i386/avx512bw-pr92908.c 2019-12-11 >>14:23:56.071665326 +0100 >>@@ -0,0 +1,21 @@ >>+/* PR target/92908 */ >>+/* { dg-do run } */ >>+/* { dg-options "-Og -fno-tree-fre -mavx512bw" } */ >>+/* { dg-require-effective-target avx512bw } */ >>+ >>+#define AVX512BW >>+#include "avx512f-helper.h" >>+ >>+typedef unsigned short V __attribute__ ((__vector_size__ (64))); >>+ >>+V v; >>+ >>+void >>+TEST (void) >>+{ >>+ int i; >>+ v = (V) v == v; >>+ for (i = 0; i < 32; i++) >>+ if (v[i] != 0xffff) >>+ abort (); >>+} >> >> Jakub