https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116142
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Xi Ruoyao from comment #4) > (In reply to Richard Biener from comment #3) > > (In reply to Xi Ruoyao from comment #2) > > > (In reply to Richard Biener from comment #1) > > > > To make it used by the reduction you'd need to have a dot_product > > > > covering > > > > the accumulation as well. > > > > > > I can add that, but what if we slightly alter it to something like > > > > > > short x[8], y[8]; > > > > > > int dot() { > > > int ret = 0; > > > for (int i = 0; i < 8; i++) > > > ret ^= x[i] * y[i]; > > > return ret; > > > } > > > > > > ? It's no longer a dot product but shouldn't > > > vec_widen_smult_{even,odd}_v8hi be used anyway? > > > > Sure, you should see > > > > t.c:5:20: note: Analyze phi: ret_13 = PHI <ret_9(5), 0(2)> > > t.c:5:20: note: reduction path: ret_9 ret_13 > > t.c:5:20: note: reduction: detected reduction > > t.c:5:20: note: Detected reduction. > > ... > > t.c:5:20: note: vect_recog_widen_mult_pattern: detected: _5 = _2 * _4; > > t.c:5:20: note: widen_mult pattern recognized: patt_24 = _1 w* _3; > > > > and then > > > > # vect_ret_13.11_12 = PHI <vect_ret_9.12_7(5), { 0, 0, 0, 0 }(2)> > > # ivtmp_29 = PHI <ivtmp_30(5), 0(2)> > > vect__1.6_20 = MEM <vector(8) short int> [(short int *)vectp_x.4_22]; > > _1 = x[i_15]; > > _2 = (int) _1; > > vect__3.9_17 = MEM <vector(8) short int> [(short int *)vectp_y.7_19]; > > vect_patt_23.10_16 = WIDEN_MULT_LO_EXPR <vect__1.6_20, vect__3.9_17>; > > vect_patt_23.10_14 = WIDEN_MULT_HI_EXPR <vect__1.6_20, vect__3.9_17>; > > vect_ret_9.12_11 = vect_patt_23.10_16 ^ vect_ret_13.11_12; > > vect_ret_9.12_7 = vect_patt_23.10_14 ^ vect_ret_9.12_11; > > > > at least that's what happens on x86. It should also work with _EVEN/_ODD. > > The condition of _EVEN/_ODD is more strict than _HI/_LO. It requires > STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction but this condition > seems not true for my test cases. Ah, that's because EVEN/ODD will de-interleave the vector and to make use of those we'd need to re-interleave to build a hi/lo pair. Consider ret[i] = x[i] * y[i]; it's safe for reductions in case the summation can be re-ordered. I'll note the check in question is redundant since we have /* Elements in a vector with vect_used_by_reduction property cannot be reordered if the use chain with this property does not have the same operation. One such an example is s += a * b, where elements in a and b cannot be reordered. Here we check if the vector defined by STMT is only directly used in the reduction statement. */ tree lhs = gimple_assign_lhs (stmt_info->stmt); stmt_vec_info use_stmt_info = loop_info->lookup_single_use (lhs); if (use_stmt_info && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def) return true; but it also seems that vect_used_by_reduction is not set very optimistically. It's definition /* defs that feed computations that end up (only) in a reduction. These defs may be used by non-reduction stmts, but eventually, any computations/values that are affected by these defs are used to compute a reduction (i.e. don't get stored to memory, for example). We use this to identify computations that we can change the order in which they are computed. */ vect_used_by_reduction, is also not quite correct since that would mean a = b[i]; d = x w* y; c = a * d; res += c; would be OK but clearly it would mix even/odd permuted lanes in 'd' with unpermuted lanes of 'a', computing a wrong value. So the STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction check isn't enough to guarantee correctness here. I think the latter test is on it's own though. I think we want to remove vect_used_in_outer_by_reduction/vect_used_by_reduction and instead propagate a flag that lanes might be arbitrarily permutated. Can you check if the following makes things work for you? diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 67f6e5df255..7496e31164c 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -14200,7 +14200,6 @@ supportable_widening_operation (vec_info *vinfo, are properly set up for the caller. If we fail, we'll continue with a VEC_WIDEN_MULT_LO/HI_EXPR check. */ if (vect_loop - && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction && !nested_in_vect_loop_p (vect_loop, stmt_info) && supportable_widening_operation (vinfo, VEC_WIDEN_MULT_EVEN_EXPR, stmt_info, vectype_out,