RE: [PATCH]middle-end: support SLP early break

Tamar Christina Wed, 09 Oct 2024 01:54:01 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Wednesday, October 9, 2024 9:20 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; j...@ventanamicro.com
> Subject: RE: [PATCH]middle-end: support SLP early break
> 
> On Tue, 8 Oct 2024, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguent...@suse.de>
> > > Sent: Wednesday, October 2, 2024 1:50 PM
> > > To: Tamar Christina <tamar.christ...@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; j...@ventanamicro.com
> > > Subject: Re: [PATCH]middle-end: support SLP early break
> > >
> > > On Tue, 1 Oct 2024, Tamar Christina wrote:
> > >
> > > > Hi all,
> > > >
> > > > This patch introduces feature parity for early break int the SLP only
> > > > vectorizer.
> > > >
> > > > The approach taken here is to treat the early exits as root statements 
> > > > for an
> > > > SLP tree.  This means that we don't need any changes to build_slp to 
> > > > support
> > > > gconds.
> > > >
> > > > Codegen for the gcond itself now has to be done out of line but the 
> > > > body of
> the
> > > > SLP blocks itself is simply driven by SLP scheduling.  There is a slight
> > > > awkwardness in having re-used vectorizable_early_exit for both SLP and 
> > > > non-
> SLP
> > > > but I've documented the differences and when I did try to refactor it 
> > > > it wasn't
> > > > really worth it given that this is a temporary state anyway.
> > > >
> > > > This version is restricted to lane = 1, as such we can re-use the 
> > > > existing
> > > > move_early_break function instead of having to do safety update through
> > > > scheduling.  I have a branch where I'm working on that but lane > 1 is 
> > > > out of
> > > > scope for GCC 15 anyway.   The only reason I will try to get moving 
> > > > through
> > > > scheduling done as a stretch goal is so we get epilogue vectorization 
> > > > back for
> > > > early break.
> > > >
> > > > The example:
> > > >
> > > > unsigned test4(unsigned x)
> > > > {
> > > >  unsigned ret = 0;
> > > >  for (int i = 0; i < N; i++)
> > > >  {
> > > >    vect_b[i] = x + i;
> > > >    if (vect_a[i]*2 != x)
> > > >      break;
> > > >    vect_a[i] = x;
> > > >
> > > >  }
> > > >  return ret;
> > > > }
> > > >
> > > > builds the following SLP instance for early break:
> > > >
> > > > note:   Analyzing vectorizable control flow: if (patt_6 != 0)
> > > > note:   Starting SLP discovery for
> > > > note:     patt_6 = _4 != x_9(D);
> > > > note:   starting SLP discovery for node 0x63abc80
> > > > note:   Build SLP for patt_6 = _4 != x_9(D);
> > > > note:   precomputed vectype: vector(4) <signed-boolean:32>
> > > > note:   nunits = 4
> > > > note:   vect_is_simple_use: operand x_9(D), type of def: external
> > > > note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 
> > > > 0][2,
> +INF]
> > > MASK 0xffff
> > > >         _3 * 2, type of def: internal
> > > > note:   starting SLP discovery for node 0x63abdc0
> > > > note:   Build SLP for _4 = _3 * 2;
> > > > note:   precomputed vectype: vector(4) unsigned int
> > > > note:   nunits = 4
> > > > note:   vect_is_simple_use: operand #
> > > >         vect_aD.4416[i_15], type of def: internal
> > > > note:   vect_is_simple_use: operand 2, type of def: constant
> > > > note:   starting SLP discovery for node 0x63abe60
> > > > note:   Build SLP for _3 = vect_a[i_15];
> > > > note:   precomputed vectype: vector(4) unsigned int
> > > > note:   nunits = 4
> > > > note:   SLP discovery for node 0x63abe60 succeeded
> > > > note:   SLP discovery for node 0x63abdc0 succeeded
> > > > note:   SLP discovery for node 0x63abc80 succeeded
> > > > note:   SLP size 3 vs. limit 10.
> > > > note:   Final SLP tree for instance 0x6474190:
> > > > note:   node 0x63abc80 (max_nunits=4, refcnt=2) vector(4) <signed-
> boolean:32>
> > > > note:   op template: patt_6 = _4 != x_9(D);
> > > > note:           stmt 0 patt_6 = _4 != x_9(D);
> > > > note:           children 0x63abd20 0x63abdc0
> > > > note:   node (external) 0x63abd20 (max_nunits=1, refcnt=1)
> > > > note:           { x_9(D) }
> > > > note:   node 0x63abdc0 (max_nunits=4, refcnt=2) vector(4) unsigned int
> > > > note:   op template: _4 = _3 * 2;
> > > > note:           stmt 0 _4 = _3 * 2;
> > > > note:           children 0x63abe60 0x63abf00
> > > > note:   node 0x63abe60 (max_nunits=4, refcnt=2) vector(4) unsigned int
> > > > note:   op template: _3 = vect_a[i_15];
> > > > note:           stmt 0 _3 = vect_a[i_15];
> > > > note:           load permutation { 0 }
> > > > note:   node (constant) 0x63abf00 (max_nunits=1, refcnt=1)
> > > > note:           { 2 }
> > > >
> > > > and during codegen:
> > > >
> > > > note:   ------>vectorizing SLP node starting from: patt_6 = _4 != 
> > > > x_9(D);
> > > > note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 
> > > > 0][2,
> +INF]
> > > MASK 0xffff
> > > >         _3 * 2, type of def: internal
> > > > note:   add new stmt: mask_patt_6.18_58 = _53 != vect__4.17_57;
> > > > note:    === vectorizable_early_exit ===
> > > > note:    transform early-exit.
> > > > note:   vectorizing stmts using SLP.
> > > > note:   Vectorizing SLP tree:
> > > > note:   node 0x63abfa0 (max_nunits=4, refcnt=1) vector(4) int
> > > > note:   op template: i_12 = i_15 + 1;
> > > > note:           stmt 0 i_12 = i_15 + 1;
> > > > note:           children 0x63aba00 0x63ac040
> > > > note:   node 0x63aba00 (max_nunits=4, refcnt=2) vector(4) int
> > > > note:   op template: i_15 = PHI <i_12(6), 0(14)>
> > > > note:           [l] stmt 0 i_15 = PHI <i_12(6), 0(14)>
> > > > note:           children (nil) (nil)
> > > > note:   node (constant) 0x63ac040 (max_nunits=1, refcnt=1) vector(4) int
> > > > note:           { 1 }
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-
> gnueabihf,
> > > > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> > > >
> > > > Also bootstrapped --with-build-config='bootstrap-O3 bootstrap-lto'
> > > > --enable-checking=release,yes,rtl,extra on aarch64-none-linux-gnu and
> > > > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * tree-vectorizer.h (enum slp_instance_kind): Add 
> > > > slp_inst_kind_gcond.
> > > >         (LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS): New.
> > > >         (vectorizable_early_exit): Expose.
> > > >         (class _loop_vec_info): Add early_break_live_stmts.
> > > >         * tree-vect-slp.cc (vect_build_slp_instance, 
> > > > vect_analyze_slp_instance):
> > > >         Support gcond instances.
> > > >         (vect_analyze_slp): Analyze gcond roots and early break live 
> > > > statements.
> > > >         (maybe_push_to_hybrid_worklist): Don't sink gconds.
> > > >         (vect_slp_analyze_node_operations): Support gconds.
> > > >         (vect_slp_check_for_roots): Update comments.
> > > >         (vectorize_slp_instance_root_stmt): Support gconds.
> > > >         (vect_schedule_slp): Pass vinfo to 
> > > > vectorize_slp_instance_root_stmt.
> > > >         * tree-vect-stmts.cc (vect_stmt_relevant_p): Record early break 
> > > > live
> > > >         statements.
> > > >         (vectorizable_early_exit): Support SLP.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.dg/vect/vect-early-break_126.c: New test.
> > > >         * gcc.dg/vect/vect-early-break_127.c: New test.
> > > >         * gcc.dg/vect/vect-early-break_128.c: New test.
> > > >
> > > > ---
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..4bfc9880f9fc869bf616123
> > > ff509d13be17ffacf
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c
> > > > @@ -0,0 +1,28 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } 
> > > > } */
> > > > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } 
> > > > } */
> > > > +
> > > > +#define N 1024
> > > > +unsigned vect_a[N];
> > > > +unsigned vect_b[N];
> > > > +
> > > > +unsigned test4(unsigned x)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < N; i++)
> > > > + {
> > > > +   vect_b[i] = x + i;
> > > > +   if (vect_a[i] > x)
> > > > +     {
> > > > +       ret *= vect_a[i];
> > > > +       return vect_a[i];
> > > > +     }
> > > > +   vect_a[i] = x;
> > > > +   ret += vect_a[i] + vect_b[i];
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..67cb5d34a77192e5d7d72
> > > c35df8e83535ef184ab
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c
> > > > @@ -0,0 +1,27 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } 
> > > > } */
> > > > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } 
> > > > } */
> > > > +
> > > > +#ifndef N
> > > > +#define N 800
> > > > +#endif
> > > > +unsigned vect_a[N];
> > > > +unsigned vect_b[N];
> > > > +
> > > > +unsigned test4(unsigned x)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < N; i++)
> > > > + {
> > > > +   vect_b[i] = x + i;
> > > > +   if (vect_a[i]*2 != x)
> > > > +     break;
> > > > +   vect_a[i] = x;
> > > > +
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..6d7fb920ec2de529a4aa1d
> > > e2c4a04286989204fd
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> > > > @@ -0,0 +1,31 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } 
> > > > } */
> > > > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } 
> > > > } */
> > > > +
> > > > +#ifndef N
> > > > +#define N 800
> > > > +#endif
> > > > +unsigned vect_a[N];
> > > > +unsigned vect_b[N];
> > > > +
> > > > +unsigned test4(unsigned x)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < N; i+=2)
> > > > + {
> > > > +   vect_b[i] = x + i;
> > > > +   vect_b[i+1] = x + i+1;
> > > > +   if (vect_a[i]*2 != x)
> > > > +     break;
> > > > +   if (vect_a[i+1]*2 != x)
> > > > +     break;
> > > > +   vect_a[i] = x;
> > > > +   vect_a[i+1] = x;
> > > > +
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index
> > >
> 600987dd6e5d506aa5fbb02350f9dab77793d382..7e765df466a59249feb999c
> > > 24d8f2dad232948ae 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -3697,6 +3697,13 @@ vect_build_slp_instance (vec_info *vinfo,
> > > >                          "Analyzing vectorizable constructor: %G\n",
> > > >                          root_stmt_infos[0]->stmt);
> > > >      }
> > > > +  else if (kind == slp_inst_kind_gcond)
> > > > +    {
> > > > +      if (dump_enabled_p ())
> > > > +       dump_printf_loc (MSG_NOTE, vect_location,
> > > > +                        "Analyzing vectorizable control flow: %G",
> > > > +                        root_stmt_infos[0]->stmt);
> > > > +    }
> > > >
> > > >    if (dump_enabled_p ())
> > > >      {
> > > > @@ -4143,6 +4150,12 @@ vect_analyze_slp_instance (vec_info *vinfo,
> > > >        STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
> > > >         = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
> > > >      }
> > > > +  else if (kind == slp_inst_kind_gcond)
> > > > +    {
> > > > +      /* Collect the stores and store them in scalar_stmts.  */
> > > > +      scalar_stmts.create (1);
> > > > +      scalar_stmts.quick_push (vect_stmt_to_vectorize (next_info));
> > > > +    }
> > >
> > > We have this "left-over" dual API but since you never call
> > > vect_analyze_slp_instance with slp_inst_kind_gcond but use
> > > vect_build_slp_instance I think you can drop this hunk.
> > >
> > > >    else
> > > >      gcc_unreachable ();
> > > >
> > > > @@ -4742,6 +4755,56 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> > > max_tree_size,
> > > >                                          bst_map, NULL, 
> > > > force_single_lane);
> > > >               }
> > > >           }
> > > > +
> > > > +      /* Find SLP sequences starting from gconds.  */
> > > > +      for (auto cond : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> > > > +       {
> > > > +         auto cond_info = loop_vinfo->lookup_stmt (cond);
> > > > +         vec<stmt_vec_info> stmts;
> > > > +         vec<stmt_vec_info> roots = vNULL;
> > > > +         vec<tree> remain = vNULL;
> > > > +
> > > > +         cond_info = vect_stmt_to_vectorize (cond_info);
> > > > +         roots.safe_push (cond_info);
> > > > +         stmts.create (2);
> > > > +         tree args0 = gimple_cond_lhs (STMT_VINFO_STMT (cond_info));
> > > > +         tree args1 = gimple_cond_rhs (STMT_VINFO_STMT (cond_info));
> > > > +         /* An argument without a loop def will be codegened from 
> > > > vectorizing
> > > the
> > > > +            root gcond itself.  As such we don't need to try to build 
> > > > an SLP tree
> > > > +            from them.  It's highly likely that the resulting SLP tree 
> > > > here if both
> > > > +            arguments have a def will be incompatible, but we rely on 
> > > > it being split
> > > > +            later on.  */
> > > > +         if (auto varg = loop_vinfo->lookup_def (args0))
> > > > +           stmts.quick_push (vect_stmt_to_vectorize (varg));
> > > > +
> > > > +         if (auto varg = loop_vinfo->lookup_def (args1))
> > > > +           stmts.quick_push (vect_stmt_to_vectorize (varg));
> > >
> > > Hmm - you pattern-replace
> > >
> > >      if (a != b)
> > >
> > > with
> > >
> > >      patt = a != b;
> > >      if (patt != 0)
> > >
> > > correct?  So I don't get why you push two scalar stmts in 'stmts'
> > > and not just the pattern def of args0?  Plus assert the condition
> > > is NE_EXPR and the cond_rhs zero?  (or simply guard discovery with
> > > that - other gconds will simply not be vectorized)
> >
> > Yeah, the reason it can fail is that the gcond can be entirely loop 
> > invariant
> > and the condition already a != 0, in which case gcond lowering would have
> > done nothing.
> >
> > e.g. if (a != 0) where a is loop invariant.  For instance test_memcmp_1_1
> > in /gcc.dg/memcmp-1.c is such loop.  Technically we should be able to
> > vectorize such loops,  but while we can represent externals in the SLP tree,
> > we can't start discovery at them, as no stmt_info for them.
> >
> > In principle all I need here is an empty SLP tree, since all codegen is 
> > driven
> > by the roots for such invariant compares.  However vect_build_slp_tree
> > doesn't accept empty stmts.
> 
> The externals would have SLP nodes of course but the requirement
> currently is that the SLP instance root is an internal def.
> 
> > I believe we are able to vectorize such loops today,  so perhaps instead of
> > failing we should support building an SLP instance with only roots?
> 
> It might be tempting but I don't think this is generally useful.
> 
> > In which case should I try to fit it into vect_build_slp_tree or just 
> > special
> > case it for the gcond discovery?
> 
> The issue is that you have two operands you technically would like to
> see code-genrated - the 'a' and the '0' vector invariants, but the
> SLP instance only has a single root.  You could (as I suggested)
> simply only build the SLP node for the (invariant) LHS of the gcond,
> not by using vect_build_slp_tree but instead by manually building
> the SLP tree for the invariant - see what vect_build_slp_tree_2 does


Ah!  That's what you meant with why am I building the SLP tree for a scalar.
I didn't clearly understand what you meant there!  Thanks!

I can do so, It does mean that I have to then have a different approach for
Invariants and loop defs,  but that's simple.  Will give it a try now.

Thanks!
Tamar

> here:
> 
>       if (oprnd_info->first_dt == vect_external_def
>           || oprnd_info->first_dt == vect_constant_def)
>         {
>           if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ())
>             {
>               tree op0;
>               tree uniform_val = op0 = oprnd_info->ops[0];
>               for (j = 1; j < oprnd_info->ops.length (); ++j)
>                 if (!operand_equal_p (uniform_val, oprnd_info->ops[j]))
>                   {
>                     uniform_val = NULL_TREE;
>                     break;
>                   }
>               if (!uniform_val
>                   && !can_duplicate_and_interleave_p (vinfo,
> 
> oprnd_info->ops.length (),
>                                                       TREE_TYPE (op0)))
>                 {
>                   matches[j] = false;
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
>                                      "Build SLP failed: invalid type of
> def "
>                                      "for variable-length SLP %T\n", op0);
>                   goto fail;
>                 }
>             }
>           slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops);
>           SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt;
>           oprnd_info->ops = vNULL;
>           children.safe_push (invnode);
>           continue;
> 
> It should probably work fine then?
> 
> Richard.
> 
> >
> > Thanks,
> > Tamar
> > >
> > > That said, when loop_vinfo->lookup_def (args1) is not NULL
> > > what you ask for doesn't really make sense - even if it discovers
> > > OK (you get vectors with the condition lhs and rhs interleaved).
> > >
> > > > +         if (!stmts.is_empty ())
> > > > +           vect_build_slp_instance (vinfo, slp_inst_kind_gcond,
> > > > +                                    stmts, roots, remain,
> > > > +                                    max_tree_size, &limit,
> > > > +                                    bst_map, NULL, force_single_lane);
> > > > +       }
> > > > +
> > > > +       /* Find and create slp instances for inductions that have been 
> > > > forced
> > > > +          live due to early break.  */
> > > > +       edge latch_e = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
> > > > +       for (auto stmt_info : LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS
> > > (loop_vinfo))
> > > > +         {
> > > > +           vec<stmt_vec_info> stmts;
> > > > +           vec<stmt_vec_info> roots = vNULL;
> > > > +           vec<tree> remain = vNULL;
> > > > +           gphi *lc_phi = as_a<gphi *> (STMT_VINFO_STMT (stmt_info));
> > > > +           tree def = gimple_phi_arg_def_from_edge (lc_phi, latch_e);
> > >
> > > Hmm, so this isn't the LC PHI but the header PHI!  So maybe better named
> > > LOOP_VINFO_EARLY_BREAKS_LIVE_IVS?
> > >
> > > Note we're now walking all actual LC PHIs, but only when marked as
> > > live.  So maybe you can instead of adding to
> > > LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS mark the defs appropriately?
> > > Just a suggestion - I think what you do works as well.
> > >
> > > > +           stmt_vec_info lc_info = loop_vinfo->lookup_def (def);
> > > > +           stmts.create (1);
> > > > +           stmts.quick_push (vect_stmt_to_vectorize (lc_info));
> > > > +           vect_build_slp_instance (vinfo, slp_inst_kind_reduc_group,
> > > > +                                    stmts, roots, remain,
> > > > +                                    max_tree_size, &limit,
> > > > +                                    bst_map, NULL, force_single_lane);
> > > > +         }
> > > >      }
> > > >
> > > >    hash_set<slp_tree> visited_patterns;
> > > > @@ -7157,8 +7220,9 @@ maybe_push_to_hybrid_worklist (vec_info *vinfo,
> > > >             }
> > > >         }
> > > >      }
> > > > -  /* No def means this is a loo_vect sink.  */
> > > > -  if (!any_def)
> > > > +  /* No def means this is a loop_vect sink.  Gimple conditionals also 
> > > > don't
> have a
> > > > +     def but shouldn't be considered sinks.  */
> > > > +  if (!any_def && STMT_VINFO_DEF_TYPE (stmt_info) != 
> > > > vect_condition_def)
> > > >      {
> > > >        if (dump_enabled_p ())
> > > >         dump_printf_loc (MSG_NOTE, vect_location,
> > > > @@ -7542,9 +7606,27 @@ vect_slp_analyze_node_operations (vec_info
> *vinfo,
> > > slp_tree node,
> > > >      return true;
> > > >    visited_vec.safe_push (node);
> > > >
> > > > +  /* If early break also check the root statement as we need to both 
> > > > analyze
> > > > +     and trigger codegen for it.  The analysis will check whether can 
> > > > actually
> > > > +     vectorize it.  At the memoment splitting off the analsysi bit 
> > > > from inside
> > > > +     it duplicates a lot of the setup code so it's not worth while to 
> > > > do so.
> > > > +     However when either the non-SLP loop vect goes away or we split
> > > vectorizable_*
> > > > +     functions then we can call the analysis only part from here 
> > > > instead.  */
> > > >    bool res = true;
> > > > -  unsigned visited_rec_start = visited_vec.length ();
> > > >    unsigned cost_vec_rec_start = cost_vec->length ();
> > > > +  if (SLP_INSTANCE_KIND (node_instance) == slp_inst_kind_gcond)
> > > > +    {
> > > > +      auto root_stmt_info = SLP_INSTANCE_ROOT_STMTS (node_instance)[0];
> > > > +      res = vectorizable_early_exit (vinfo, root_stmt_info, NULL, 
> > > > NULL, NULL,
> > > > +                                    cost_vec);
> > >
> > > Can you do this here instead:
> > >
> > > bool
> > > vect_slp_analyze_operations (vec_info *vinfo)
> > > {
> > > ...
> > >           /* Check we can vectorize the reduction.  */
> > >           || (SLP_INSTANCE_KIND (instance) == slp_inst_kind_bb_reduc
> > >               && !vectorizable_bb_reduc_epilogue (instance, &cost_vec)))
> > >
> > > as we're checking the roots for slp_inst_kind_ctor and
> > > slp_inst_kind_bb_reduc here already.
> > >
> > > > +      if (!res)
> > > > +       {
> > > > +         cost_vec->truncate (cost_vec_rec_start);
> > > > +         return res;
> > > > +       }
> > > > +    }
> > > > +
> > > > +  unsigned visited_rec_start = visited_vec.length ();
> > > >    bool seen_non_constant_child = false;
> > > >    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > > >      {
> > > > @@ -8612,6 +8694,8 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
> > > >          !gsi_end_p (gsi); gsi_next (&gsi))
> > > >      {
> > > >        gassign *assign = dyn_cast<gassign *> (gsi_stmt (gsi));
> > > > +      /* This can be used to start SLP discovery for early breaks for 
> > > > BB early
> breaks
> > > > +        when we get that far.  */
> > > >        if (!assign)
> > > >         continue;
> > > >
> > > > @@ -10758,7 +10842,7 @@ vect_remove_slp_scalar_calls (vec_info *vinfo,
> > > slp_tree node)
> > > >  /* Vectorize the instance root.  */
> > > >
> > > >  void
> > > > -vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
> > > > +vectorize_slp_instance_root_stmt (vec_info *vinfo, slp_tree node,
> slp_instance
> > > instance)
> > > >  {
> > > >    gassign *rstmt = NULL;
> > > >
> > > > @@ -10862,6 +10946,21 @@ vectorize_slp_instance_root_stmt (slp_tree
> node,
> > > slp_instance instance)
> > > >        update_stmt (gsi_stmt (rgsi));
> > > >        return;
> > > >      }
> > > > +  else if (instance->kind == slp_inst_kind_gcond)
> > > > +    {
> > > > +      /* Only support a single root for now as we can't codegen CFG 
> > > > yet and so
> we
> > > > +        can't support lane > 1 at this time.  */
> > > > +      gcc_assert (instance->root_stmts.length () == 1);
> > > > +      auto root_stmt_info = instance->root_stmts[0];
> > > > +      auto last_stmt = vect_find_first_scalar_stmt_in_slp (node)->stmt;
> > > > +      gimple_stmt_iterator rgsi = gsi_for_stmt (last_stmt);
> > >
> > > So last_stmt is actually the gcond (and thus root_stmt_info)?  Why
> > > not use that for the rgsi directly?
> > >
> > > > +      gimple *vec_stmt = NULL;
> > > > +      gcc_assert (SLP_TREE_NUMBER_OF_VEC_STMTS (node) != 0);
> > >
> > > I guess you'd want to assert SLP_TREE_VEC_DEFS is not empty?
> > >
> > > > +      bool res = vectorizable_early_exit (vinfo, root_stmt_info, &rgsi,
> > > > +                                         &vec_stmt, node, NULL);
> > > > +      gcc_assert (res);
> > > > +      return;
> > > > +    }
> > > >    else
> > > >      gcc_unreachable ();
> > > >
> > > > @@ -11080,7 +11179,7 @@ vect_schedule_slp (vec_info *vinfo, const
> > > vec<slp_instance> &slp_instances)
> > > >         vect_schedule_scc (vinfo, node, instance, scc_info, maxdfs, 
> > > > stack);
> > > >
> > > >        if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
> > > > -       vectorize_slp_instance_root_stmt (node, instance);
> > > > +       vectorize_slp_instance_root_stmt (vinfo, node, instance);
> > > >
> > > >        if (dump_enabled_p ())
> > > >         dump_printf_loc (MSG_NOTE, vect_location,
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> b72b54d666879d8485f8d972b4e8d9dc64bc86b3..8f3f35989879199ffd0eb24
> > > 729cb7ade856a3c4d 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -411,6 +411,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >           dump_printf_loc (MSG_NOTE, vect_location,
> > > >                            "vec_stmt_relevant_p: induction forced for "
> > > >                            "early break.\n");
> > > > +      LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS (loop_vinfo).safe_push
> > > (stmt_info);
> > > >        *live_p = true;
> > >
> > > ah, so it's even live already - what's the relevancy used?  See above
> > > for the SLP discovery question regarding to that we already walk all
> > > actual LC PHI defs.
> > >
> > > Otherwise looks OK - thanks for the work!
> > >
> > > Richard.
> > >
> > > >
> > > >      }
> > > > @@ -12933,7 +12934,7 @@ vectorizable_comparison (vec_info *vinfo,
> > > >  /* Check to see if the current early break given in STMT_INFO is valid 
> > > > for
> > > >     vectorization.  */
> > > >
> > > > -static bool
> > > > +bool
> > > >  vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > > >                          gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > > >                          slp_tree slp_node, stmt_vector_for_cost 
> > > > *cost_vec)
> > > > @@ -12958,7 +12959,7 @@ vectorizable_early_exit (vec_info *vinfo,
> > > stmt_vec_info stmt_info,
> > > >    tree op0;
> > > >    enum vect_def_type dt0;
> > > >    if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, 
> > > > &slp_op0,
> &dt0,
> > > > -                          &vectype))
> > > > +                             &vectype))
> > > >      {
> > > >        if (dump_enabled_p ())
> > > >           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > @@ -12966,6 +12967,13 @@ vectorizable_early_exit (vec_info *vinfo,
> > > stmt_vec_info stmt_info,
> > > >         return false;
> > > >      }
> > > >
> > > > +  /* For SLP we don't want to use the type of the operands of the SLP 
> > > > node,
> > > when
> > > > +     vectorizing using SLP slp_node will be the children of the gcond 
> > > > and we
> want
> > > to
> > > > +     use the type of the direct children which since the gcond is root 
> > > > will be the
> > > > +     current node, rather than a child node as vect_is_simple_use 
> > > > assumes.  */
> > > > +  if (slp_node)
> > > > +    vectype = SLP_TREE_VECTYPE (slp_node);
> > > > +
> > > >    if (!vectype)
> > > >      return false;
> > > >
> > > > @@ -13060,9 +13068,18 @@ vectorizable_early_exit (vec_info *vinfo,
> > > stmt_vec_info stmt_info,
> > > >    if (dump_enabled_p ())
> > > >      dump_printf_loc (MSG_NOTE, vect_location, "transform 
> > > > early-exit.\n");
> > > >
> > > > -  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > > -                                 vec_stmt, slp_node, cost_vec))
> > > > -    gcc_unreachable ();
> > > > +  /* For SLP we don't do codegen of the body starting from the gcond, 
> > > > the
> > > gconds are
> > > > +     roots and so by the time we get to them we have already codegened 
> > > > the
> SLP
> > > tree
> > > > +     and so we shouldn't try to do so again.  The arguments have 
> > > > already been
> > > > +     vectorized.  It's not very clean to do this here, But the masking 
> > > > code below
> is
> > > > +     complex and this keeps it all in one place to ease fixes and 
> > > > backports.
> Once
> > > we
> > > > +     drop the non-SLP loop vect or split vectorizable_* this can be 
> > > > simplified.
> */
> > > > +  if (!slp_node)
> > > > +    {
> > > > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, 
> > > > gsi,
> > > > +                                     vec_stmt, slp_node, cost_vec))
> > > > +       gcc_unreachable ();
> > > > +    }
> > > >
> > > >    gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > >    basic_block cond_bb = gimple_bb (stmt);
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index
> > >
> 490061aea2f6d465d9589eb97bbd34a920d76b1c..53483303c4ac3482760fe72
> > > 2354f602e0243e5a2 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -252,7 +252,8 @@ enum slp_instance_kind {
> > > >      slp_inst_kind_reduc_group,
> > > >      slp_inst_kind_reduc_chain,
> > > >      slp_inst_kind_bb_reduc,
> > > > -    slp_inst_kind_ctor
> > > > +    slp_inst_kind_ctor,
> > > > +    slp_inst_kind_gcond
> > > >  };
> > > >
> > > >  /* SLP instance is a sequence of stmts in a loop that can be packed 
> > > > into
> > > > @@ -977,6 +978,10 @@ public:
> > > >    /* Statements whose VUSES need updating if early break vectorization 
> > > > is to
> > > >       happen.  */
> > > >    auto_vec<gimple*> early_break_vuses;
> > > > +
> > > > +  /* Record statements that are needed to be live for early break
> vectorization
> > > > +     but may not have an LC PHI node materialized yet in the exits.  */
> > > > +  auto_vec<stmt_vec_info> early_break_live_stmts;
> > > >  } *loop_vec_info;
> > > >
> > > >  /* Access Functions.  */
> > > > @@ -1036,6 +1041,8 @@ public:
> > > >  #define LOOP_VINFO_EARLY_BRK_STORES(L)     (L)->early_break_stores
> > > >  #define LOOP_VINFO_EARLY_BREAKS_VECT_PEELED(L)  \
> > > >    (single_pred ((L)->loop->latch) != (L)->vec_loop_iv_exit->src)
> > > > +#define LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS(L)  \
> > > > +  (L)->early_break_live_stmts
> > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > > @@ -2522,6 +2529,9 @@ extern bool vectorizable_phi (vec_info *,
> > > stmt_vec_info, gimple **, slp_tree,
> > > >                               stmt_vector_for_cost *);
> > > >  extern bool vectorizable_recurr (loop_vec_info, stmt_vec_info,
> > > >                                   gimple **, slp_tree, 
> > > > stmt_vector_for_cost *);
> > > > +extern bool vectorizable_early_exit (vec_info *, stmt_vec_info,
> > > > +                                    gimple_stmt_iterator *, gimple **,
> > > > +                                    slp_tree, stmt_vector_for_cost *);
> > > >  extern bool vect_emulated_vector_p (tree);
> > > >  extern bool vect_can_vectorize_without_simd_p (tree_code);
> > > >  extern bool vect_can_vectorize_without_simd_p (code_helper);
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH]middle-end: support SLP early break

Reply via email to