On December 31, 2020 5:06:36 PM GMT+01:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >In this testcase we end up with: > > unsigned long long x = ...; > char y = (char) (x << 37); > >The overwidening pattern realised that only the low 8 bits >of x << 37 are needed, but then tried to turn that into: > > unsigned long long x = ...; > char y = (char) x << 37; > >which gives an out-of-range shift. In this case y can simply >be replaced by zero, but as the comment in the patch says, >it's kind-of awkward to do that in the middle of vectorisation. > >Most of the overwidening stuff is about keeping operations >as narrow as possible, which is important for vectorisation >but could be counter-productive for scalars (especially on >RISC targets). In contrast, optimising y to zero in the above >feels like an independent optimisation that would benefit scalar >code and that should happen before vectorisation. > >Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. >OK to install?
Ok. Richard. >Richard > > >gcc/ > PR tree-vectoriation/98302 > * tree-vect-patterns.c (vect_determine_precisions_from_users): Make > sure that the precision remains greater than the shift count. > >gcc/testsuite/ > * gcc.dg/vect/pr98302.c: New test. >--- > gcc/testsuite/gcc.dg/vect/pr98302.c | 22 ++++++++++++++++++++++ > gcc/tree-vect-patterns.c | 13 +++++++++++-- > 2 files changed, 33 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/pr98302.c > >diff --git a/gcc/testsuite/gcc.dg/vect/pr98302.c >b/gcc/testsuite/gcc.dg/vect/pr98302.c >new file mode 100644 >index 00000000000..dec60167629 >--- /dev/null >+++ b/gcc/testsuite/gcc.dg/vect/pr98302.c >@@ -0,0 +1,22 @@ >+#include "tree-vect.h" >+ >+int c = 1705; >+char a; >+long f = 50887638; >+unsigned long long *h(unsigned long long *k, unsigned long long *l) { >+ return *k ? k : l; >+} >+void aa() {} >+int main() { >+ check_vect (); >+ >+ long d = f; >+ for (char g = 0; g < (char)c - 10; g += 2) { >+ unsigned long long i = d, j = 4; >+ a = *h(&i, &j) << ((d ? 169392992 : 0) - 169392955LL); >+ } >+ if (a) >+ __builtin_abort(); >+ >+ return 0; >+} >diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c >index ff1358ae79a..081ae80d430 100644 >--- a/gcc/tree-vect-patterns.c >+++ b/gcc/tree-vect-patterns.c >@@ -4961,10 +4961,19 @@ vect_determine_precisions_from_users >(stmt_vec_info stmt_info, gassign *stmt) > unsigned int const_shift = TREE_INT_CST_LOW (shift); > if (code == LSHIFT_EXPR) > { >+ /* Avoid creating an undefined shift. >+ >+ ??? We could instead use min_output_precision as-is and >+ optimize out-of-range shifts to zero. However, only >+ degenerate testcases shift away all their useful input data, >+ and it isn't natural to drop input operations in the middle >+ of vectorization. This sort of thing should really be >+ handled before vectorization. */ >+ operation_precision = MAX (stmt_info->min_output_precision, >+ const_shift + 1); > /* We need CONST_SHIFT fewer bits of the input. */ >- operation_precision = stmt_info->min_output_precision; > min_input_precision = (MAX (operation_precision, const_shift) >- - const_shift); >+ - const_shift); > } > else > {