https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #2 from Feng Xue <fxue at os dot amperecomputing.com> --- (In reply to Richard Biener from comment #1) > It's the logic > > FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) > { > if (svisited.contains (stmt_info)) > continue; > stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info); > if (STMT_VINFO_IN_PATTERN_P (orig_stmt_info) > && STMT_VINFO_RELATED_STMT (orig_stmt_info) != stmt_info) > /* Only the pattern root stmt computes the original scalar value. */ > continue; > bool mark_visited = true; > gimple *orig_stmt = orig_stmt_info->stmt; > ssa_op_iter op_iter; > def_operand_p def_p; > FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF) > { > imm_use_iterator use_iter; > gimple *use_stmt; > stmt_vec_info use_stmt_info; > FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) > if (!is_gimple_debug (use_stmt)) > { > use_stmt_info = bb_vinfo->lookup_stmt (use_stmt); > if (!use_stmt_info > || !PURE_SLP_STMT (vect_stmt_to_vectorize > (use_stmt_info))) > { > STMT_VINFO_LIVE_P (stmt_info) = true; > > specifically the last check. That's supposed to pick up the "main" pattern > that's now covering the scalar stmt. > > But somehow the "main" pattern, > > patt_67 = .VEC_WIDEN_MINUS (_1, _3); // _5 = _2 - _4; > patt_68 = (signed short) patt_67; // _5 = _2 - _4; > patt_69 = (int) patt_68; // _5 = _2 - _4; > > doesn't get picked up here. I wonder what's the orig_stmt and the def > picked and what original scalar use we end up in where the > vect_stmt_to_vectorize isn't the "last" pattern. Maybe we really want This problem happens at slp node: note: node 0x425bc38 (max_nunits=8, refcnt=1) vector(8) char note: op template: _1 = *a_50(D); note: stmt 0 _1 = *a_50(D); note: stmt 1 _7 = MEM[(char *)a_50(D) + 1B]; note: stmt 2 _13 = MEM[(char *)a_50(D) + 2B]; note: stmt 3 _19 = MEM[(char *)a_50(D) + 3B]; note: stmt 4 _25 = MEM[(char *)a_50(D) + 4B]; note: stmt 5 _31 = MEM[(char *)a_50(D) + 5B]; note: stmt 6 _37 = MEM[(char *)a_50(D) + 6B]; note: stmt 7 _43 = MEM[(char *)a_50(D) + 7B]; The orig_stmt is "_1 = *a_50(D)" The use stmt is "_2 = (int) _1", whose pattern statement is "patt_64 = (int) patt_63", which is not referenced by any original or other pattern statements. Or in other word, the orig_stmt could be absorbed into a vector operation, without any outlier scalar use. The fore-mentioned "last check" in vect_bb_slp_mark_live_stmts would make the orig_stmt to be STMT_VINFO_LIVE_P, which actually implies it has scalar use (though it should not have), the difference is re-generating the def somewhere, rather than retaining the original scalar statement. And the following "vectorizable_live_operation" would account the new operations into vectorization cost of the SLP instance. The function vect_bb_vectorization_profitable_p resorts to a recursive way to identify scalar use, for this case, setting STMT_VINFO_LIVE_P or not would change scalar cost computation. If we can avoid such fake-liveness adjustment on the statements we are interested in, vectorization cost could beat scalar cost, and make the former succeed. Unvectorized: mov x2, x0 stp x29, x30, [sp, -48]! mov x29, sp ldrb w3, [x1] ldrb w4, [x1, 1] add x0, sp, 16 ldrb w9, [x2] ldrb w8, [x2, 1] sub w9, w9, w3 ldrb w7, [x2, 2] ldrb w3, [x1, 2] sub w8, w8, w4 ldrb w6, [x2, 3] ldrb w4, [x1, 3] sub w7, w7, w3 ldrb w10, [x1, 5] ldrb w3, [x1, 4] sub w6, w6, w4 ldrb w5, [x2, 4] ldrb w4, [x2, 5] sub w5, w5, w3 ldrb w3, [x2, 6] sub w4, w4, w10 ldrb w2, [x2, 7] ldrb w10, [x1, 6] ldrb w1, [x1, 7] sub w3, w3, w10 stp w9, w8, [sp, 16] sub w1, w2, w1 stp w7, w6, [sp, 24] stp w5, w4, [sp, 32] stp w3, w1, [sp, 40] bl test ldp x29, x30, [sp], 48 ret Vectorized: mov x2, x0 stp x29, x30, [sp, -48]! mov x29, sp ldr d1, [x1] add x0, sp, 16 ldr d0, [x2] usubl v0.8h, v0.8b, v1.8b sxtl v1.4s, v0.4h sxtl2 v0.4s, v0.8h stp q1, q0, [sp, 16] bl test ldp x29, x30, [sp], 48 ret > these "overlapping" patterns, but IMHO having "two entries" into > a chain of scalar stmts is bad and we should link up the whole matched > sequence to the final "root" instead? > > That said, the current code doesn't see that wherever we end up isn't > dead code (aka fully covered by the vectorization). > > IMO vect_stmt_to_vectorize for each of those stmts should end up at > > patt_69 = (int) patt_68;