Hi! The following testcase is miscompiled, because the variable shift left operand, { -1, -1, -1, -1 } is represented as a VECTOR_CST with VECTOR_CST_NPATTERNS 1 and VECTOR_CST_NELTS_PER_PATTERN 1, so when we call builder.new_unary_operation, builder.encoded_nelts () will be just 1 and thus we encode the resulting vector as if all the elements were the same. For non-masked is_vshift, we could perhaps call builder.new_binary_operation (TREE_TYPE (args[0]), args[0], args[1], false), but then there are masked shifts, for non-is_vshift we could perhaps call it too but with args[2] instead of args[1], but there is no builder.new_ternary_operation. All this stuff is primarily for aarch64 anyway, on x86 we don't have any variable length vectors, and it is not a big deal to compute all elements and just let builder.finalize () find the most efficient VECTOR_CST representation of the vector. So, instead of doing too much, this just keeps using new_unary_operation only if only one VECTOR_CST is involved (i.e. non-masked shift by constant) and for the rest just compute all elts.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2020-01-28 Jakub Jelinek <ja...@redhat.com> PR target/93418 * config/i386/i386.c (ix86_fold_builtin) <do_shift>: If mask is not -1 or is_vshift is true, use new_vector with number of elts npatterns rather than new_unary_operation. * gcc.target/i386/avx2-pr93418.c: New test. --- gcc/config/i386/i386.c.jj 2020-01-22 09:49:27.375413362 +0100 +++ gcc/config/i386/i386.c 2020-01-27 18:22:34.986577375 +0100 @@ -17278,8 +17278,13 @@ ix86_fold_builtin (tree fndecl, int n_ar countt = build_int_cst (integer_type_node, count); } tree_vector_builder builder; - builder.new_unary_operation (TREE_TYPE (args[0]), args[0], - false); + if (mask != HOST_WIDE_INT_M1U || is_vshift) + builder.new_vector (TREE_TYPE (args[0]), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (args[0])), + 1); + else + builder.new_unary_operation (TREE_TYPE (args[0]), args[0], + false); unsigned int cnt = builder.encoded_nelts (); for (unsigned int i = 0; i < cnt; ++i) { --- gcc/testsuite/gcc.target/i386/avx2-pr93418.c.jj 2020-01-27 18:27:53.461799372 +0100 +++ gcc/testsuite/gcc.target/i386/avx2-pr93418.c 2020-01-27 18:26:11.592327685 +0100 @@ -0,0 +1,20 @@ +/* PR target/93418 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */ + +#include <x86intrin.h> + +void link_error (void); + +void +foo (void) +{ + __m128i a = _mm_set1_epi32 (0xffffffffU); + __m128i b = _mm_setr_epi32 (16, 31, -34, 3); + __m128i c = _mm_sllv_epi32 (a, b); + __v4su d = (__v4su) c; + if (d[0] != 0xffff0000U || d[1] != 0x80000000U + || d[2] != 0 || d[3] != 0xfffffff8U) + link_error (); +} Jakub