On December 31, 2020 5:06:36 PM GMT+01:00, Richard Sandiford 
<richard.sandif...@arm.com> wrote:
>In this testcase we end up with:
>
>  unsigned long long x = ...;
>  char y = (char) (x << 37);
>
>The overwidening pattern realised that only the low 8 bits
>of x << 37 are needed, but then tried to turn that into:
>
>  unsigned long long x = ...;
>  char y = (char) x << 37;
>
>which gives an out-of-range shift.  In this case y can simply
>be replaced by zero, but as the comment in the patch says,
>it's kind-of awkward to do that in the middle of vectorisation.
>
>Most of the overwidening stuff is about keeping operations
>as narrow as possible, which is important for vectorisation
>but could be counter-productive for scalars (especially on
>RISC targets).  In contrast, optimising y to zero in the above
>feels like an independent optimisation that would benefit scalar
>code and that should happen before vectorisation.
>
>Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
>OK to install?

Ok. 

Richard. 

>Richard
>
>
>gcc/
>       PR tree-vectoriation/98302
>       * tree-vect-patterns.c (vect_determine_precisions_from_users): Make
>       sure that the precision remains greater than the shift count.
>
>gcc/testsuite/
>       * gcc.dg/vect/pr98302.c: New test.
>---
> gcc/testsuite/gcc.dg/vect/pr98302.c | 22 ++++++++++++++++++++++
> gcc/tree-vect-patterns.c            | 13 +++++++++++--
> 2 files changed, 33 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/pr98302.c
>
>diff --git a/gcc/testsuite/gcc.dg/vect/pr98302.c
>b/gcc/testsuite/gcc.dg/vect/pr98302.c
>new file mode 100644
>index 00000000000..dec60167629
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/vect/pr98302.c
>@@ -0,0 +1,22 @@
>+#include "tree-vect.h"
>+
>+int c = 1705;
>+char a;
>+long f = 50887638;
>+unsigned long long *h(unsigned long long *k, unsigned long long *l) {
>+  return *k ? k : l;
>+}
>+void aa() {}
>+int main() {
>+  check_vect ();
>+
>+  long d = f;
>+  for (char g = 0; g < (char)c - 10; g += 2) {
>+    unsigned long long i = d, j = 4;
>+    a = *h(&i, &j) << ((d ? 169392992 : 0) - 169392955LL);
>+  }
>+  if (a)
>+    __builtin_abort();
>+
>+  return 0;
>+}
>diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>index ff1358ae79a..081ae80d430 100644
>--- a/gcc/tree-vect-patterns.c
>+++ b/gcc/tree-vect-patterns.c
>@@ -4961,10 +4961,19 @@ vect_determine_precisions_from_users
>(stmt_vec_info stmt_info, gassign *stmt)
>       unsigned int const_shift = TREE_INT_CST_LOW (shift);
>       if (code == LSHIFT_EXPR)
>         {
>+          /* Avoid creating an undefined shift.
>+
>+             ??? We could instead use min_output_precision as-is and
>+             optimize out-of-range shifts to zero.  However, only
>+             degenerate testcases shift away all their useful input data,
>+             and it isn't natural to drop input operations in the middle
>+             of vectorization.  This sort of thing should really be
>+             handled before vectorization.  */
>+          operation_precision = MAX (stmt_info->min_output_precision,
>+                                     const_shift + 1);
>           /* We need CONST_SHIFT fewer bits of the input.  */
>-          operation_precision = stmt_info->min_output_precision;
>           min_input_precision = (MAX (operation_precision, const_shift)
>-                                  - const_shift);
>+                                 - const_shift);
>         }
>       else
>         {

Reply via email to