https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're detecting a COND_REDUCTION with a chain.  It seems to work (and
vectorize) with -march=znver4 using AVX sized vectors (but AVX512 style
masking).

I think what goes wrong is treating the COND_REDUCTION as MAX reduction
by only checking the last COND which looks like

  c_lsm.9_18 = _76 ? prephitmp_26 : 0;

but the previous one is

  prephitmp_26 = _69 ? c_lsm.9_30 : -3;

I'm not too familiar with the condition reduction code, the reduction
is classified as cond_reduc_dt == vect_constant_def and so we run into

      else if (cond_reduc_dt == vect_constant_def)
        {
          enum vect_def_type cond_initial_dt;
          tree cond_initial_val = vect_phi_initial_value (reduc_def_phi);
          vect_is_simple_use (cond_initial_val, loop_vinfo, &cond_initial_dt);
          if (cond_initial_dt == vect_constant_def
              && types_compatible_p (TREE_TYPE (cond_initial_val),
                                     TREE_TYPE (cond_reduc_val)))
            {
              tree e = fold_binary (LE_EXPR, boolean_type_node,
                                    cond_initial_val, cond_reduc_val);
              if (e && (integer_onep (e) || integer_zerop (e)))
                {
                  if (dump_enabled_p ())
                    dump_printf_loc (MSG_NOTE, vect_location,
                                     "condition expression based on "
                                     "compile time constant.\n");
                  /* Record reduction code at analysis stage.  */
                  STMT_VINFO_REDUC_CODE (reduc_info)
                    = integer_onep (e) ? MAX_EXPR : MIN_EXPR;
                  STMT_VINFO_REDUC_TYPE (reduc_info) = CONST_COND_REDUCTION;
                }

and the loop classifying and computing cond_reduc_val just looks at the
first chain element ...  This should possibly be merged with the loop
going over all chain stmts but a more conservative fix for the latent(?)
issue might be the following (but that also cuts out conversions in the
chain):

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5a5865c42fc..e19896eef79 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7762,14 +7762,16 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       if (op.code == COND_EXPR)
        {
          /* Record how the non-reduction-def value of COND_EXPR is defined. 
*/
-         if (dt == vect_constant_def)
+         if (reduc_chain_length != 1)
+           ;
+         else if (dt == vect_constant_def)
            {
              cond_reduc_dt = dt;
              cond_reduc_val = op.ops[i];
            }
-         if (dt == vect_induction_def
-             && def_stmt_info
-             && is_nonwrapping_integer_induction (def_stmt_info, loop))
+         else if (dt == vect_induction_def
+                  && def_stmt_info
+                  && is_nonwrapping_integer_induction (def_stmt_info, loop))
            {
              cond_reduc_dt = dt;
              cond_stmt_vinfo = def_stmt_info;


I think it's latent even before the bisected rev.

Reply via email to