https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- We're detecting a COND_REDUCTION with a chain. It seems to work (and vectorize) with -march=znver4 using AVX sized vectors (but AVX512 style masking). I think what goes wrong is treating the COND_REDUCTION as MAX reduction by only checking the last COND which looks like c_lsm.9_18 = _76 ? prephitmp_26 : 0; but the previous one is prephitmp_26 = _69 ? c_lsm.9_30 : -3; I'm not too familiar with the condition reduction code, the reduction is classified as cond_reduc_dt == vect_constant_def and so we run into else if (cond_reduc_dt == vect_constant_def) { enum vect_def_type cond_initial_dt; tree cond_initial_val = vect_phi_initial_value (reduc_def_phi); vect_is_simple_use (cond_initial_val, loop_vinfo, &cond_initial_dt); if (cond_initial_dt == vect_constant_def && types_compatible_p (TREE_TYPE (cond_initial_val), TREE_TYPE (cond_reduc_val))) { tree e = fold_binary (LE_EXPR, boolean_type_node, cond_initial_val, cond_reduc_val); if (e && (integer_onep (e) || integer_zerop (e))) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "condition expression based on " "compile time constant.\n"); /* Record reduction code at analysis stage. */ STMT_VINFO_REDUC_CODE (reduc_info) = integer_onep (e) ? MAX_EXPR : MIN_EXPR; STMT_VINFO_REDUC_TYPE (reduc_info) = CONST_COND_REDUCTION; } and the loop classifying and computing cond_reduc_val just looks at the first chain element ... This should possibly be merged with the loop going over all chain stmts but a more conservative fix for the latent(?) issue might be the following (but that also cuts out conversions in the chain): diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 5a5865c42fc..e19896eef79 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7762,14 +7762,16 @@ vectorizable_reduction (loop_vec_info loop_vinfo, if (op.code == COND_EXPR) { /* Record how the non-reduction-def value of COND_EXPR is defined. */ - if (dt == vect_constant_def) + if (reduc_chain_length != 1) + ; + else if (dt == vect_constant_def) { cond_reduc_dt = dt; cond_reduc_val = op.ops[i]; } - if (dt == vect_induction_def - && def_stmt_info - && is_nonwrapping_integer_induction (def_stmt_info, loop)) + else if (dt == vect_induction_def + && def_stmt_info + && is_nonwrapping_integer_induction (def_stmt_info, loop)) { cond_reduc_dt = dt; cond_stmt_vinfo = def_stmt_info; I think it's latent even before the bisected rev.