> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Wednesday, October 9, 2024 9:20 AM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; j...@ventanamicro.com > Subject: RE: [PATCH]middle-end: support SLP early break > > On Tue, 8 Oct 2024, Tamar Christina wrote: > > > > -----Original Message----- > > > From: Richard Biener <rguent...@suse.de> > > > Sent: Wednesday, October 2, 2024 1:50 PM > > > To: Tamar Christina <tamar.christ...@arm.com> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; j...@ventanamicro.com > > > Subject: Re: [PATCH]middle-end: support SLP early break > > > > > > On Tue, 1 Oct 2024, Tamar Christina wrote: > > > > > > > Hi all, > > > > > > > > This patch introduces feature parity for early break int the SLP only > > > > vectorizer. > > > > > > > > The approach taken here is to treat the early exits as root statements > > > > for an > > > > SLP tree. This means that we don't need any changes to build_slp to > > > > support > > > > gconds. > > > > > > > > Codegen for the gcond itself now has to be done out of line but the > > > > body of > the > > > > SLP blocks itself is simply driven by SLP scheduling. There is a slight > > > > awkwardness in having re-used vectorizable_early_exit for both SLP and > > > > non- > SLP > > > > but I've documented the differences and when I did try to refactor it > > > > it wasn't > > > > really worth it given that this is a temporary state anyway. > > > > > > > > This version is restricted to lane = 1, as such we can re-use the > > > > existing > > > > move_early_break function instead of having to do safety update through > > > > scheduling. I have a branch where I'm working on that but lane > 1 is > > > > out of > > > > scope for GCC 15 anyway. The only reason I will try to get moving > > > > through > > > > scheduling done as a stretch goal is so we get epilogue vectorization > > > > back for > > > > early break. > > > > > > > > The example: > > > > > > > > unsigned test4(unsigned x) > > > > { > > > > unsigned ret = 0; > > > > for (int i = 0; i < N; i++) > > > > { > > > > vect_b[i] = x + i; > > > > if (vect_a[i]*2 != x) > > > > break; > > > > vect_a[i] = x; > > > > > > > > } > > > > return ret; > > > > } > > > > > > > > builds the following SLP instance for early break: > > > > > > > > note: Analyzing vectorizable control flow: if (patt_6 != 0) > > > > note: Starting SLP discovery for > > > > note: patt_6 = _4 != x_9(D); > > > > note: starting SLP discovery for node 0x63abc80 > > > > note: Build SLP for patt_6 = _4 != x_9(D); > > > > note: precomputed vectype: vector(4) <signed-boolean:32> > > > > note: nunits = 4 > > > > note: vect_is_simple_use: operand x_9(D), type of def: external > > > > note: vect_is_simple_use: operand # RANGE [irange] unsigned int [0, > > > > 0][2, > +INF] > > > MASK 0xffff > > > > _3 * 2, type of def: internal > > > > note: starting SLP discovery for node 0x63abdc0 > > > > note: Build SLP for _4 = _3 * 2; > > > > note: precomputed vectype: vector(4) unsigned int > > > > note: nunits = 4 > > > > note: vect_is_simple_use: operand # > > > > vect_aD.4416[i_15], type of def: internal > > > > note: vect_is_simple_use: operand 2, type of def: constant > > > > note: starting SLP discovery for node 0x63abe60 > > > > note: Build SLP for _3 = vect_a[i_15]; > > > > note: precomputed vectype: vector(4) unsigned int > > > > note: nunits = 4 > > > > note: SLP discovery for node 0x63abe60 succeeded > > > > note: SLP discovery for node 0x63abdc0 succeeded > > > > note: SLP discovery for node 0x63abc80 succeeded > > > > note: SLP size 3 vs. limit 10. > > > > note: Final SLP tree for instance 0x6474190: > > > > note: node 0x63abc80 (max_nunits=4, refcnt=2) vector(4) <signed- > boolean:32> > > > > note: op template: patt_6 = _4 != x_9(D); > > > > note: stmt 0 patt_6 = _4 != x_9(D); > > > > note: children 0x63abd20 0x63abdc0 > > > > note: node (external) 0x63abd20 (max_nunits=1, refcnt=1) > > > > note: { x_9(D) } > > > > note: node 0x63abdc0 (max_nunits=4, refcnt=2) vector(4) unsigned int > > > > note: op template: _4 = _3 * 2; > > > > note: stmt 0 _4 = _3 * 2; > > > > note: children 0x63abe60 0x63abf00 > > > > note: node 0x63abe60 (max_nunits=4, refcnt=2) vector(4) unsigned int > > > > note: op template: _3 = vect_a[i_15]; > > > > note: stmt 0 _3 = vect_a[i_15]; > > > > note: load permutation { 0 } > > > > note: node (constant) 0x63abf00 (max_nunits=1, refcnt=1) > > > > note: { 2 } > > > > > > > > and during codegen: > > > > > > > > note: ------>vectorizing SLP node starting from: patt_6 = _4 != > > > > x_9(D); > > > > note: vect_is_simple_use: operand # RANGE [irange] unsigned int [0, > > > > 0][2, > +INF] > > > MASK 0xffff > > > > _3 * 2, type of def: internal > > > > note: add new stmt: mask_patt_6.18_58 = _53 != vect__4.17_57; > > > > note: === vectorizable_early_exit === > > > > note: transform early-exit. > > > > note: vectorizing stmts using SLP. > > > > note: Vectorizing SLP tree: > > > > note: node 0x63abfa0 (max_nunits=4, refcnt=1) vector(4) int > > > > note: op template: i_12 = i_15 + 1; > > > > note: stmt 0 i_12 = i_15 + 1; > > > > note: children 0x63aba00 0x63ac040 > > > > note: node 0x63aba00 (max_nunits=4, refcnt=2) vector(4) int > > > > note: op template: i_15 = PHI <i_12(6), 0(14)> > > > > note: [l] stmt 0 i_15 = PHI <i_12(6), 0(14)> > > > > note: children (nil) (nil) > > > > note: node (constant) 0x63ac040 (max_nunits=1, refcnt=1) vector(4) int > > > > note: { 1 } > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux- > gnueabihf, > > > > x86_64-pc-linux-gnu -m32, -m64 and no issues. > > > > > > > > Also bootstrapped --with-build-config='bootstrap-O3 bootstrap-lto' > > > > --enable-checking=release,yes,rtl,extra on aarch64-none-linux-gnu and > > > > x86_64-pc-linux-gnu -m32, -m64 and no issues. > > > > > > > > Ok for master? > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-vectorizer.h (enum slp_instance_kind): Add > > > > slp_inst_kind_gcond. > > > > (LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS): New. > > > > (vectorizable_early_exit): Expose. > > > > (class _loop_vec_info): Add early_break_live_stmts. > > > > * tree-vect-slp.cc (vect_build_slp_instance, > > > > vect_analyze_slp_instance): > > > > Support gcond instances. > > > > (vect_analyze_slp): Analyze gcond roots and early break live > > > > statements. > > > > (maybe_push_to_hybrid_worklist): Don't sink gconds. > > > > (vect_slp_analyze_node_operations): Support gconds. > > > > (vect_slp_check_for_roots): Update comments. > > > > (vectorize_slp_instance_root_stmt): Support gconds. > > > > (vect_schedule_slp): Pass vinfo to > > > > vectorize_slp_instance_root_stmt. > > > > * tree-vect-stmts.cc (vect_stmt_relevant_p): Record early break > > > > live > > > > statements. > > > > (vectorizable_early_exit): Support SLP. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/vect/vect-early-break_126.c: New test. > > > > * gcc.dg/vect/vect-early-break_127.c: New test. > > > > * gcc.dg/vect/vect-early-break_128.c: New test. > > > > > > > > --- > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..4bfc9880f9fc869bf616123 > > > ff509d13be17ffacf > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c > > > > @@ -0,0 +1,28 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } > > > > } */ > > > > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } > > > > } */ > > > > + > > > > +#define N 1024 > > > > +unsigned vect_a[N]; > > > > +unsigned vect_b[N]; > > > > + > > > > +unsigned test4(unsigned x) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 0; i < N; i++) > > > > + { > > > > + vect_b[i] = x + i; > > > > + if (vect_a[i] > x) > > > > + { > > > > + ret *= vect_a[i]; > > > > + return vect_a[i]; > > > > + } > > > > + vect_a[i] = x; > > > > + ret += vect_a[i] + vect_b[i]; > > > > + } > > > > + return ret; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..67cb5d34a77192e5d7d72 > > > c35df8e83535ef184ab > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c > > > > @@ -0,0 +1,27 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } > > > > } */ > > > > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } > > > > } */ > > > > + > > > > +#ifndef N > > > > +#define N 800 > > > > +#endif > > > > +unsigned vect_a[N]; > > > > +unsigned vect_b[N]; > > > > + > > > > +unsigned test4(unsigned x) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 0; i < N; i++) > > > > + { > > > > + vect_b[i] = x + i; > > > > + if (vect_a[i]*2 != x) > > > > + break; > > > > + vect_a[i] = x; > > > > + > > > > + } > > > > + return ret; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..6d7fb920ec2de529a4aa1d > > > e2c4a04286989204fd > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c > > > > @@ -0,0 +1,31 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } > > > > } */ > > > > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } > > > > } */ > > > > + > > > > +#ifndef N > > > > +#define N 800 > > > > +#endif > > > > +unsigned vect_a[N]; > > > > +unsigned vect_b[N]; > > > > + > > > > +unsigned test4(unsigned x) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 0; i < N; i+=2) > > > > + { > > > > + vect_b[i] = x + i; > > > > + vect_b[i+1] = x + i+1; > > > > + if (vect_a[i]*2 != x) > > > > + break; > > > > + if (vect_a[i+1]*2 != x) > > > > + break; > > > > + vect_a[i] = x; > > > > + vect_a[i+1] = x; > > > > + > > > > + } > > > > + return ret; > > > > +} > > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > > > index > > > > 600987dd6e5d506aa5fbb02350f9dab77793d382..7e765df466a59249feb999c > > > 24d8f2dad232948ae 100644 > > > > --- a/gcc/tree-vect-slp.cc > > > > +++ b/gcc/tree-vect-slp.cc > > > > @@ -3697,6 +3697,13 @@ vect_build_slp_instance (vec_info *vinfo, > > > > "Analyzing vectorizable constructor: %G\n", > > > > root_stmt_infos[0]->stmt); > > > > } > > > > + else if (kind == slp_inst_kind_gcond) > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > + "Analyzing vectorizable control flow: %G", > > > > + root_stmt_infos[0]->stmt); > > > > + } > > > > > > > > if (dump_enabled_p ()) > > > > { > > > > @@ -4143,6 +4150,12 @@ vect_analyze_slp_instance (vec_info *vinfo, > > > > STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) > > > > = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ())); > > > > } > > > > + else if (kind == slp_inst_kind_gcond) > > > > + { > > > > + /* Collect the stores and store them in scalar_stmts. */ > > > > + scalar_stmts.create (1); > > > > + scalar_stmts.quick_push (vect_stmt_to_vectorize (next_info)); > > > > + } > > > > > > We have this "left-over" dual API but since you never call > > > vect_analyze_slp_instance with slp_inst_kind_gcond but use > > > vect_build_slp_instance I think you can drop this hunk. > > > > > > > else > > > > gcc_unreachable (); > > > > > > > > @@ -4742,6 +4755,56 @@ vect_analyze_slp (vec_info *vinfo, unsigned > > > max_tree_size, > > > > bst_map, NULL, > > > > force_single_lane); > > > > } > > > > } > > > > + > > > > + /* Find SLP sequences starting from gconds. */ > > > > + for (auto cond : LOOP_VINFO_LOOP_CONDS (loop_vinfo)) > > > > + { > > > > + auto cond_info = loop_vinfo->lookup_stmt (cond); > > > > + vec<stmt_vec_info> stmts; > > > > + vec<stmt_vec_info> roots = vNULL; > > > > + vec<tree> remain = vNULL; > > > > + > > > > + cond_info = vect_stmt_to_vectorize (cond_info); > > > > + roots.safe_push (cond_info); > > > > + stmts.create (2); > > > > + tree args0 = gimple_cond_lhs (STMT_VINFO_STMT (cond_info)); > > > > + tree args1 = gimple_cond_rhs (STMT_VINFO_STMT (cond_info)); > > > > + /* An argument without a loop def will be codegened from > > > > vectorizing > > > the > > > > + root gcond itself. As such we don't need to try to build > > > > an SLP tree > > > > + from them. It's highly likely that the resulting SLP tree > > > > here if both > > > > + arguments have a def will be incompatible, but we rely on > > > > it being split > > > > + later on. */ > > > > + if (auto varg = loop_vinfo->lookup_def (args0)) > > > > + stmts.quick_push (vect_stmt_to_vectorize (varg)); > > > > + > > > > + if (auto varg = loop_vinfo->lookup_def (args1)) > > > > + stmts.quick_push (vect_stmt_to_vectorize (varg)); > > > > > > Hmm - you pattern-replace > > > > > > if (a != b) > > > > > > with > > > > > > patt = a != b; > > > if (patt != 0) > > > > > > correct? So I don't get why you push two scalar stmts in 'stmts' > > > and not just the pattern def of args0? Plus assert the condition > > > is NE_EXPR and the cond_rhs zero? (or simply guard discovery with > > > that - other gconds will simply not be vectorized) > > > > Yeah, the reason it can fail is that the gcond can be entirely loop > > invariant > > and the condition already a != 0, in which case gcond lowering would have > > done nothing. > > > > e.g. if (a != 0) where a is loop invariant. For instance test_memcmp_1_1 > > in /gcc.dg/memcmp-1.c is such loop. Technically we should be able to > > vectorize such loops, but while we can represent externals in the SLP tree, > > we can't start discovery at them, as no stmt_info for them. > > > > In principle all I need here is an empty SLP tree, since all codegen is > > driven > > by the roots for such invariant compares. However vect_build_slp_tree > > doesn't accept empty stmts. > > The externals would have SLP nodes of course but the requirement > currently is that the SLP instance root is an internal def. > > > I believe we are able to vectorize such loops today, so perhaps instead of > > failing we should support building an SLP instance with only roots? > > It might be tempting but I don't think this is generally useful. > > > In which case should I try to fit it into vect_build_slp_tree or just > > special > > case it for the gcond discovery? > > The issue is that you have two operands you technically would like to > see code-genrated - the 'a' and the '0' vector invariants, but the > SLP instance only has a single root. You could (as I suggested) > simply only build the SLP node for the (invariant) LHS of the gcond, > not by using vect_build_slp_tree but instead by manually building > the SLP tree for the invariant - see what vect_build_slp_tree_2 does
Ah! That's what you meant with why am I building the SLP tree for a scalar. I didn't clearly understand what you meant there! Thanks! I can do so, It does mean that I have to then have a different approach for Invariants and loop defs, but that's simple. Will give it a try now. Thanks! Tamar > here: > > if (oprnd_info->first_dt == vect_external_def > || oprnd_info->first_dt == vect_constant_def) > { > if (!GET_MODE_SIZE (vinfo->vector_mode).is_constant ()) > { > tree op0; > tree uniform_val = op0 = oprnd_info->ops[0]; > for (j = 1; j < oprnd_info->ops.length (); ++j) > if (!operand_equal_p (uniform_val, oprnd_info->ops[j])) > { > uniform_val = NULL_TREE; > break; > } > if (!uniform_val > && !can_duplicate_and_interleave_p (vinfo, > > oprnd_info->ops.length (), > TREE_TYPE (op0))) > { > matches[j] = false; > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, > vect_location, > "Build SLP failed: invalid type of > def " > "for variable-length SLP %T\n", op0); > goto fail; > } > } > slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops); > SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt; > oprnd_info->ops = vNULL; > children.safe_push (invnode); > continue; > > It should probably work fine then? > > Richard. > > > > > Thanks, > > Tamar > > > > > > That said, when loop_vinfo->lookup_def (args1) is not NULL > > > what you ask for doesn't really make sense - even if it discovers > > > OK (you get vectors with the condition lhs and rhs interleaved). > > > > > > > + if (!stmts.is_empty ()) > > > > + vect_build_slp_instance (vinfo, slp_inst_kind_gcond, > > > > + stmts, roots, remain, > > > > + max_tree_size, &limit, > > > > + bst_map, NULL, force_single_lane); > > > > + } > > > > + > > > > + /* Find and create slp instances for inductions that have been > > > > forced > > > > + live due to early break. */ > > > > + edge latch_e = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo)); > > > > + for (auto stmt_info : LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS > > > (loop_vinfo)) > > > > + { > > > > + vec<stmt_vec_info> stmts; > > > > + vec<stmt_vec_info> roots = vNULL; > > > > + vec<tree> remain = vNULL; > > > > + gphi *lc_phi = as_a<gphi *> (STMT_VINFO_STMT (stmt_info)); > > > > + tree def = gimple_phi_arg_def_from_edge (lc_phi, latch_e); > > > > > > Hmm, so this isn't the LC PHI but the header PHI! So maybe better named > > > LOOP_VINFO_EARLY_BREAKS_LIVE_IVS? > > > > > > Note we're now walking all actual LC PHIs, but only when marked as > > > live. So maybe you can instead of adding to > > > LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS mark the defs appropriately? > > > Just a suggestion - I think what you do works as well. > > > > > > > + stmt_vec_info lc_info = loop_vinfo->lookup_def (def); > > > > + stmts.create (1); > > > > + stmts.quick_push (vect_stmt_to_vectorize (lc_info)); > > > > + vect_build_slp_instance (vinfo, slp_inst_kind_reduc_group, > > > > + stmts, roots, remain, > > > > + max_tree_size, &limit, > > > > + bst_map, NULL, force_single_lane); > > > > + } > > > > } > > > > > > > > hash_set<slp_tree> visited_patterns; > > > > @@ -7157,8 +7220,9 @@ maybe_push_to_hybrid_worklist (vec_info *vinfo, > > > > } > > > > } > > > > } > > > > - /* No def means this is a loo_vect sink. */ > > > > - if (!any_def) > > > > + /* No def means this is a loop_vect sink. Gimple conditionals also > > > > don't > have a > > > > + def but shouldn't be considered sinks. */ > > > > + if (!any_def && STMT_VINFO_DEF_TYPE (stmt_info) != > > > > vect_condition_def) > > > > { > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_NOTE, vect_location, > > > > @@ -7542,9 +7606,27 @@ vect_slp_analyze_node_operations (vec_info > *vinfo, > > > slp_tree node, > > > > return true; > > > > visited_vec.safe_push (node); > > > > > > > > + /* If early break also check the root statement as we need to both > > > > analyze > > > > + and trigger codegen for it. The analysis will check whether can > > > > actually > > > > + vectorize it. At the memoment splitting off the analsysi bit > > > > from inside > > > > + it duplicates a lot of the setup code so it's not worth while to > > > > do so. > > > > + However when either the non-SLP loop vect goes away or we split > > > vectorizable_* > > > > + functions then we can call the analysis only part from here > > > > instead. */ > > > > bool res = true; > > > > - unsigned visited_rec_start = visited_vec.length (); > > > > unsigned cost_vec_rec_start = cost_vec->length (); > > > > + if (SLP_INSTANCE_KIND (node_instance) == slp_inst_kind_gcond) > > > > + { > > > > + auto root_stmt_info = SLP_INSTANCE_ROOT_STMTS (node_instance)[0]; > > > > + res = vectorizable_early_exit (vinfo, root_stmt_info, NULL, > > > > NULL, NULL, > > > > + cost_vec); > > > > > > Can you do this here instead: > > > > > > bool > > > vect_slp_analyze_operations (vec_info *vinfo) > > > { > > > ... > > > /* Check we can vectorize the reduction. */ > > > || (SLP_INSTANCE_KIND (instance) == slp_inst_kind_bb_reduc > > > && !vectorizable_bb_reduc_epilogue (instance, &cost_vec))) > > > > > > as we're checking the roots for slp_inst_kind_ctor and > > > slp_inst_kind_bb_reduc here already. > > > > > > > + if (!res) > > > > + { > > > > + cost_vec->truncate (cost_vec_rec_start); > > > > + return res; > > > > + } > > > > + } > > > > + > > > > + unsigned visited_rec_start = visited_vec.length (); > > > > bool seen_non_constant_child = false; > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > > > { > > > > @@ -8612,6 +8694,8 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo) > > > > !gsi_end_p (gsi); gsi_next (&gsi)) > > > > { > > > > gassign *assign = dyn_cast<gassign *> (gsi_stmt (gsi)); > > > > + /* This can be used to start SLP discovery for early breaks for > > > > BB early > breaks > > > > + when we get that far. */ > > > > if (!assign) > > > > continue; > > > > > > > > @@ -10758,7 +10842,7 @@ vect_remove_slp_scalar_calls (vec_info *vinfo, > > > slp_tree node) > > > > /* Vectorize the instance root. */ > > > > > > > > void > > > > -vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) > > > > +vectorize_slp_instance_root_stmt (vec_info *vinfo, slp_tree node, > slp_instance > > > instance) > > > > { > > > > gassign *rstmt = NULL; > > > > > > > > @@ -10862,6 +10946,21 @@ vectorize_slp_instance_root_stmt (slp_tree > node, > > > slp_instance instance) > > > > update_stmt (gsi_stmt (rgsi)); > > > > return; > > > > } > > > > + else if (instance->kind == slp_inst_kind_gcond) > > > > + { > > > > + /* Only support a single root for now as we can't codegen CFG > > > > yet and so > we > > > > + can't support lane > 1 at this time. */ > > > > + gcc_assert (instance->root_stmts.length () == 1); > > > > + auto root_stmt_info = instance->root_stmts[0]; > > > > + auto last_stmt = vect_find_first_scalar_stmt_in_slp (node)->stmt; > > > > + gimple_stmt_iterator rgsi = gsi_for_stmt (last_stmt); > > > > > > So last_stmt is actually the gcond (and thus root_stmt_info)? Why > > > not use that for the rgsi directly? > > > > > > > + gimple *vec_stmt = NULL; > > > > + gcc_assert (SLP_TREE_NUMBER_OF_VEC_STMTS (node) != 0); > > > > > > I guess you'd want to assert SLP_TREE_VEC_DEFS is not empty? > > > > > > > + bool res = vectorizable_early_exit (vinfo, root_stmt_info, &rgsi, > > > > + &vec_stmt, node, NULL); > > > > + gcc_assert (res); > > > > + return; > > > > + } > > > > else > > > > gcc_unreachable (); > > > > > > > > @@ -11080,7 +11179,7 @@ vect_schedule_slp (vec_info *vinfo, const > > > vec<slp_instance> &slp_instances) > > > > vect_schedule_scc (vinfo, node, instance, scc_info, maxdfs, > > > > stack); > > > > > > > > if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ()) > > > > - vectorize_slp_instance_root_stmt (node, instance); > > > > + vectorize_slp_instance_root_stmt (vinfo, node, instance); > > > > > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_NOTE, vect_location, > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > index > > > > b72b54d666879d8485f8d972b4e8d9dc64bc86b3..8f3f35989879199ffd0eb24 > > > 729cb7ade856a3c4d 100644 > > > > --- a/gcc/tree-vect-stmts.cc > > > > +++ b/gcc/tree-vect-stmts.cc > > > > @@ -411,6 +411,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, > > > loop_vec_info loop_vinfo, > > > > dump_printf_loc (MSG_NOTE, vect_location, > > > > "vec_stmt_relevant_p: induction forced for " > > > > "early break.\n"); > > > > + LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS (loop_vinfo).safe_push > > > (stmt_info); > > > > *live_p = true; > > > > > > ah, so it's even live already - what's the relevancy used? See above > > > for the SLP discovery question regarding to that we already walk all > > > actual LC PHI defs. > > > > > > Otherwise looks OK - thanks for the work! > > > > > > Richard. > > > > > > > > > > > } > > > > @@ -12933,7 +12934,7 @@ vectorizable_comparison (vec_info *vinfo, > > > > /* Check to see if the current early break given in STMT_INFO is valid > > > > for > > > > vectorization. */ > > > > > > > > -static bool > > > > +bool > > > > vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, > > > > gimple_stmt_iterator *gsi, gimple **vec_stmt, > > > > slp_tree slp_node, stmt_vector_for_cost > > > > *cost_vec) > > > > @@ -12958,7 +12959,7 @@ vectorizable_early_exit (vec_info *vinfo, > > > stmt_vec_info stmt_info, > > > > tree op0; > > > > enum vect_def_type dt0; > > > > if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, > > > > &slp_op0, > &dt0, > > > > - &vectype)) > > > > + &vectype)) > > > > { > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > @@ -12966,6 +12967,13 @@ vectorizable_early_exit (vec_info *vinfo, > > > stmt_vec_info stmt_info, > > > > return false; > > > > } > > > > > > > > + /* For SLP we don't want to use the type of the operands of the SLP > > > > node, > > > when > > > > + vectorizing using SLP slp_node will be the children of the gcond > > > > and we > want > > > to > > > > + use the type of the direct children which since the gcond is root > > > > will be the > > > > + current node, rather than a child node as vect_is_simple_use > > > > assumes. */ > > > > + if (slp_node) > > > > + vectype = SLP_TREE_VECTYPE (slp_node); > > > > + > > > > if (!vectype) > > > > return false; > > > > > > > > @@ -13060,9 +13068,18 @@ vectorizable_early_exit (vec_info *vinfo, > > > stmt_vec_info stmt_info, > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_NOTE, vect_location, "transform > > > > early-exit.\n"); > > > > > > > > - if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, > > > > - vec_stmt, slp_node, cost_vec)) > > > > - gcc_unreachable (); > > > > + /* For SLP we don't do codegen of the body starting from the gcond, > > > > the > > > gconds are > > > > + roots and so by the time we get to them we have already codegened > > > > the > SLP > > > tree > > > > + and so we shouldn't try to do so again. The arguments have > > > > already been > > > > + vectorized. It's not very clean to do this here, But the masking > > > > code below > is > > > > + complex and this keeps it all in one place to ease fixes and > > > > backports. > Once > > > we > > > > + drop the non-SLP loop vect or split vectorizable_* this can be > > > > simplified. > */ > > > > + if (!slp_node) > > > > + { > > > > + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, > > > > gsi, > > > > + vec_stmt, slp_node, cost_vec)) > > > > + gcc_unreachable (); > > > > + } > > > > > > > > gimple *stmt = STMT_VINFO_STMT (stmt_info); > > > > basic_block cond_bb = gimple_bb (stmt); > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > > > index > > > > 490061aea2f6d465d9589eb97bbd34a920d76b1c..53483303c4ac3482760fe72 > > > 2354f602e0243e5a2 100644 > > > > --- a/gcc/tree-vectorizer.h > > > > +++ b/gcc/tree-vectorizer.h > > > > @@ -252,7 +252,8 @@ enum slp_instance_kind { > > > > slp_inst_kind_reduc_group, > > > > slp_inst_kind_reduc_chain, > > > > slp_inst_kind_bb_reduc, > > > > - slp_inst_kind_ctor > > > > + slp_inst_kind_ctor, > > > > + slp_inst_kind_gcond > > > > }; > > > > > > > > /* SLP instance is a sequence of stmts in a loop that can be packed > > > > into > > > > @@ -977,6 +978,10 @@ public: > > > > /* Statements whose VUSES need updating if early break vectorization > > > > is to > > > > happen. */ > > > > auto_vec<gimple*> early_break_vuses; > > > > + > > > > + /* Record statements that are needed to be live for early break > vectorization > > > > + but may not have an LC PHI node materialized yet in the exits. */ > > > > + auto_vec<stmt_vec_info> early_break_live_stmts; > > > > } *loop_vec_info; > > > > > > > > /* Access Functions. */ > > > > @@ -1036,6 +1041,8 @@ public: > > > > #define LOOP_VINFO_EARLY_BRK_STORES(L) (L)->early_break_stores > > > > #define LOOP_VINFO_EARLY_BREAKS_VECT_PEELED(L) \ > > > > (single_pred ((L)->loop->latch) != (L)->vec_loop_iv_exit->src) > > > > +#define LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS(L) \ > > > > + (L)->early_break_live_stmts > > > > #define LOOP_VINFO_EARLY_BRK_DEST_BB(L) (L)->early_break_dest_bb > > > > #define LOOP_VINFO_EARLY_BRK_VUSES(L) (L)->early_break_vuses > > > > #define LOOP_VINFO_LOOP_CONDS(L) (L)->conds > > > > @@ -2522,6 +2529,9 @@ extern bool vectorizable_phi (vec_info *, > > > stmt_vec_info, gimple **, slp_tree, > > > > stmt_vector_for_cost *); > > > > extern bool vectorizable_recurr (loop_vec_info, stmt_vec_info, > > > > gimple **, slp_tree, > > > > stmt_vector_for_cost *); > > > > +extern bool vectorizable_early_exit (vec_info *, stmt_vec_info, > > > > + gimple_stmt_iterator *, gimple **, > > > > + slp_tree, stmt_vector_for_cost *); > > > > extern bool vect_emulated_vector_p (tree); > > > > extern bool vect_can_vectorize_without_simd_p (tree_code); > > > > extern bool vect_can_vectorize_without_simd_p (code_helper); > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Richard Biener <rguent...@suse.de> > > > SUSE Software Solutions Germany GmbH, > > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg) > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)