On Tue, 15 Apr 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Tuesday, April 15, 2025 12:50 PM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: Richard Sandiford <richard.sandif...@arm.com>; gcc-patches@gcc.gnu.org;
> > nd <n...@arm.com>
> > Subject: RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> > [PR119351]
> > 
> > On Tue, 15 Apr 2025, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Sandiford <richard.sandif...@arm.com>
> > > > Sent: Tuesday, April 15, 2025 10:52 AM
> > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de
> > > > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> > > > [PR119351]
> > > >
> > > > Tamar Christina <tamar.christ...@arm.com> writes:
> > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > > index
> > > >
> > 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1c
> > > > e591f7c38b76d07f 100644
> > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> > > > loop_vinfo,
> > > > >        return false;
> > > > >      }
> > > > >
> > > > > +  /* With early break vectorization we don't know whether the 
> > > > > accesses will
> > stay
> > > > > +     inside the loop or not.  TODO: The early break adjustment code 
> > > > > can be
> > > > > +     implemented the same way for vectorizable_linear_induction.  
> > > > > However
> > we
> > > > > +     can't test this today so reject it.  */
> > > > > +  if (niters_skip != NULL_TREE
> > > > > +      && vect_use_loop_mask_for_alignment_p (loop_vinfo)
> > > > > +      && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> > > > > +      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    {
> > > > > +      if (dump_enabled_p ())
> > > > > +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > +                      "Peeling for alignement using masking is not 
> > > > > supported"
> > > > > +                      " for nonlinear induction when using early 
> > > > > breaks.\n");
> > > > > +      return false;
> > > > > +    }
> > > > > +
> > > > >    return true;
> > > > >  }
> > > >
> > > > FTR, I was wondering here whether we should predict this in advance and
> > > > instead drop down to peeling for alignment without masks.  It probably
> > > > isn't worth the effort though.
> > >
> > > We could move the check into vect_use_loop_mask_for_alignment_p where
> > > rejecting it there would get it to fall back to scalar peeling.  That 
> > > seems simple
> > enough
> > > if that's preferrable.
> > 
> > The above is perferable IMO (short of fixing up that case, but with
> > a testcase).
> > 
> 
> I wasn't able to make a testcase before as any non-linear induction feeding a 
> load becomes
> a gather load, which we block outright way before getting here though.  I 
> couldn't think of
> an example where it wouldn't be, even a gapped load e.g +=2 became one.

Well, a classic one would be

 int x = 2;
 for (i = 0; i < n; ++i)
   {
     if (y[i])
       break;
     x *= 3;
   }
 return x;

or the negate case

 int x = 2;
 for (i = 0; i < n; ++i)
   { 
     if (y[i]) 
       break;
     x = -x;
   }
 return x;

possibly we mark those as unhandled for early exits somewhere already.
There does seem to be code handling PFA with masks for them.

Richard.


> Thanks,
> Tamar
> 
> > Richard.
> > 
> > > Cheers,
> > > Tamar
> > > >
> > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > index
> > > >
> > 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0
> > > > 057d2fa46245d04d9 100644
> > > > > --- a/gcc/tree-vect-loop.cc
> > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info
> > > > loop_vinfo,
> > > > >                                      LOOP_VINFO_MASK_SKIP_NITERS
> > > > (loop_vinfo));
> > > > >         peel_mul = gimple_build_vector_from_val (&init_stmts,
> > > > >                                                  step_vectype, 
> > > > > peel_mul);
> > > > > +
> > > > > +       /* If early break then we have to create a new PHI which we 
> > > > > can use as
> > > > > +         an offset to adjust the induction reduction in early exits. 
> > > > >  */
> > > > > +       if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +         {
> > > > > +           auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS 
> > > > > (loop_vinfo);
> > > > > +           tree ty_skip_niters = TREE_TYPE (skip_niters);
> > > > > +           tree break_lhs_phi = NULL_TREE;
> > > > > +           break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
> > > > > +                                                  vect_scalar_var,
> > > > > +                                                  "pfa_iv_offset");
> > > > > +           gphi *nphi = create_phi_node (break_lhs_phi, bb);
> > > > > +           add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
> > > > > +           add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
> > > > > +                        loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
> > > > > +
> > > > > +           /* Rewrite all the early exit usages.  */
> > > > > +           tree phi_lhs = PHI_RESULT (phi);
> > > > > +           imm_use_iterator iter;
> > > > > +           use_operand_p use_p;
> > > > > +           gimple *use_stmt;
> > > > > +
> > > > > +           FOR_EACH_IMM_USE_FAST (use_p, iter, phi_lhs)
> > > > > +             {
> > > > > +               use_stmt = USE_STMT (use_p);
> > > > > +               if (!flow_bb_inside_loop_p (iv_loop, gimple_bb 
> > > > > (use_stmt))
> > > > > +                   && is_a <gphi *> (use_stmt))
> > > > > +                 {
> > > > > +                   auto gsi = gsi_last_bb (use_stmt->bb);
> > > > > +                   for (auto exit : get_loop_exit_edges (iv_loop))
> > > > > +                     if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo)
> > > > > +                         && bb == exit->src)
> > > > > +                       {
> > > > > +                         /* Now create the PHI for the outside loop 
> > > > > usage to
> > > > > +                            retrieve the value for the offset 
> > > > > counter.  */
> > > > > +                         tree rphi_lhs = make_ssa_name 
> > > > > (ty_skip_niters);
> > > > > +                         gphi *rphi
> > > > > +                           = create_phi_node (rphi_lhs, 
> > > > > use_stmt->bb);
> > > > > +                         for (unsigned i = 0; i < 
> > > > > gimple_phi_num_args (rphi);
> > > > > +                              i++)
> > > > > +                           SET_PHI_ARG_DEF (rphi, i, PHI_RESULT 
> > > > > (nphi));
> > > > > +
> > > > > +                         tree tmp = make_ssa_name (TREE_TYPE 
> > > > > (phi_lhs));
> > > > > +                         tree stmt_lhs = PHI_RESULT (use_stmt);
> > > > > +                         imm_use_iterator iter2;
> > > > > +                         gimple *use_stmt2;
> > > > > +                         use_operand_p use2_p;
> > > > > +
> > > > > +                         /* Now rewrite all the usages first.  */
> > > > > +                         FOR_EACH_IMM_USE_STMT (use_stmt2, iter2,
> > > > stmt_lhs)
> > > > > +                           FOR_EACH_IMM_USE_ON_STMT (use2_p, iter2)
> > > > > +                             SET_USE (use2_p, tmp);
> > > > > +
> > > > > +                         /* And then generate the adjustment to 
> > > > > avoid the
> > > > > +                            update code from updating this new 
> > > > > usage.  The
> > > > > +                            multiplicaiton is to get the original IV 
> > > > > and the
> > > > > +                            downwards counting IV correct.  */
> > > >
> > > > typo: multiplication
> > > >
> > > > But I don't think it's just upcounting vs downcounting.  An upcounting 
> > > > iv
> > > > with step 2 would also need the multiplication.  That is, we're applying
> > > > PHI_RESULT (rphi) iv updates, and so need to add the iv step that many 
> > > > times.
> > > >
> > > > So IMO it would be clearer to drop the reference specifically to 
> > > > downcounting
> > > > here.
> > > >
> > > > The patch LGTM with the comment nit fixed, but Richi should have the
> > > > final say.
> > > >
> > > > Thanks,
> > > > Richard
> > > >
> > > > > +                         gimple_seq iv_stmts = NULL;
> > > > > +                         tree rphi_step
> > > > > +                           = gimple_convert (&iv_stmts, 
> > > > > ty_skip_niters,
> > > > > +                                             step_expr);
> > > > > +                         tree tmp2
> > > > > +                           = gimple_build (&iv_stmts, MULT_EXPR,
> > > > > +                                           ty_skip_niters, rphi_step,
> > > > > +                                           PHI_RESULT (rphi));
> > > > > +
> > > > > +                         if (POINTER_TYPE_P (TREE_TYPE (stmt_lhs)))
> > > > > +                           tmp2
> > > > > +                             = gimple_build (&iv_stmts, 
> > > > > POINTER_PLUS_EXPR,
> > > > > +                                             TREE_TYPE (stmt_lhs), 
> > > > > stmt_lhs,
> > > > > +                                             tmp2);
> > > > > +                         else
> > > > > +                           {
> > > > > +                             tmp2
> > > > > +                               = gimple_convert (&iv_stmts,
> > > > > +                                                 TREE_TYPE 
> > > > > (stmt_lhs),
> > > > > +                                                 tmp2);
> > > > > +                             tmp2
> > > > > +                               = gimple_build (&iv_stmts, PLUS_EXPR,
> > > > > +                                               TREE_TYPE (stmt_lhs), 
> > > > > stmt_lhs,
> > > > > +                                               tmp2);
> > > > > +                           }
> > > > > +
> > > > > +                         gsi_insert_seq_before (&gsi, iv_stmts,
> > > > > +                                                GSI_SAME_STMT);
> > > > > +                         gimple *cvt_stmt =
> > > > > +                           gimple_build_assign (tmp, 
> > > > > VIEW_CONVERT_EXPR,
> > > > > +                                                build1 
> > > > > (VIEW_CONVERT_EXPR,
> > > > > +                                                        TREE_TYPE 
> > > > > (phi_lhs),
> > > > > +                                                        tmp2));
> > > > > +                         gsi_insert_before (&gsi, cvt_stmt, 
> > > > > GSI_SAME_STMT);
> > > > > +                       }
> > > > > +                   /* All early exits point to the same common 
> > > > > block, so we
> > > > > +                      only have to find the first one.  */
> > > > > +                   break;
> > > > > +                 }
> > > > > +             }
> > > > > +         }
> > > > >       }
> > > > >        tree step_mul = NULL_TREE;
> > > > >        unsigned ivn;
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to