On Fri, 20 Nov 2020, Tamar Christina wrote: > Hi Richi, > > > -----Original Message----- > > From: rguent...@ryzen.fritz.box <rguent...@ryzen.fritz.box> On Behalf Of > > Richard Biener > > Sent: Friday, November 20, 2020 9:54 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Tamar Christina <tamar.christ...@arm.com> > > Subject: [PATCH] Deal with (pattern) SLP consumed stmts in hybrid discovery > > > > This makes hybrid SLP discovery deal with stmts indirectly consumed by SLP, > > for example via patterns. This means that all uses of a stmt end up in SLP > > vectorized stmts. > > > > This helps my prototype patches for PR97832 where I make SLP discovery re- > > associate chains to make operands match. This ends up building SLP > > computation nodes without 1:1 representatives in the scalar IL and thus no > > scalar lane defs in SLP_TREE_SCALAR_STMTS. Nevertheless all of the original > > scalar stmts are consumed so this represents another kind of SLP pattern for > > the computation chain result. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > Tamar - can you check if this helps you avoiding all the relevancy push/pop > > stuff as well as avoiding any pattern marking for SLP patterns at all? > > > > Looks like it does allow me to avoid the relevancy and SLP_TYPE markings: > > === vect_detect_hybrid_slp === > Processing hybrid candidate : slp_patt_48 = .COMPLEX_MUL (_pa_50, _pa_49); > Marked SLP consumed stmt pure: slp_patt_48 = .COMPLEX_MUL (_pa_50, _pa_49); > Processing hybrid candidate : slp_patt_51 = .COMPLEX_MUL (_pa_53, _pa_52); > Marked SLP consumed stmt pure: slp_patt_51 = .COMPLEX_MUL (_pa_53, _pa_52); > Processing hybrid candidate : _24 = _9 * _19; > Marked SLP consumed stmt pure: _24 = _9 * _19; > Processing hybrid candidate : _23 = _10 * _18; > Marked SLP consumed stmt pure: _23 = _10 * _18; > Processing hybrid candidate : _22 = _9 * _18; > Marked SLP consumed stmt pure: _22 = _9 * _18; > Processing hybrid candidate : _17 = _10 * _19; > Marked SLP consumed stmt pure: _17 = _10 * _19; > > And costing looks to be ignoring the unrelated statements > > 0x50bd800 REALPART_EXPR <*_3> 1 times unaligned_load (misalign -1) costs 1 in > body > 0x50bd800 REALPART_EXPR <*_5> 1 times unaligned_load (misalign -1) costs 1 in > body > 0x50bd800 .COMPLEX_MUL (_pa_53, _pa_52) 1 times vector_stmt costs 1 in body > 0x50bd800 REALPART_EXPR <*_5> 1 times unaligned_load (misalign -1) costs 1 in > body > 0x50bd800 <unknown> 0 times vec_perm costs 0 in body > 0x50bd800 .COMPLEX_MUL (_pa_50, _pa_49) 1 times vector_stmt costs 1 in body > 0x50bd800 <unknown> 1 times vec_perm costs 2 in body > 0x50bd800 _25 1 times unaligned_store (misalign -1) costs 1 in body > > But it doesn't allow me to avoid the pattern markings. > > Without the pattern markings it will crash when analyzing the loads > > note: ==> examining statement: _26 = REALPART_EXPR <*_5>; > note: Vectorizing an unaligned access. > note: vect_model_load_cost: unaligned supported by hardware. > note: vect_model_load_cost: inside_cost = 1, prologue_cost = 0 . > > as the statement it finds is not being used:
Hmm, I see. I'll dig into this a bit when you send out the patch. > cadd.c:22:6: internal compiler error: in vect_slp_analyze_node_operations_1, > at tree-vect-slp.c:3591 > 22 | void f90 (TYPE complex a[restrict N], TYPE complex b[restrict N], > TYPE complex c[restrict N]) > | ^~~ > 0x17200e1 vect_slp_analyze_node_operations_1 > /gnu-work/src/gcc/gcc/tree-vect-slp.c:3591 > 0x17209e8 vect_slp_analyze_node_operations > /gnu-work/src/gcc/gcc/tree-vect-slp.c:3784 > 0x1720995 vect_slp_analyze_node_operations > /gnu-work/src/gcc/gcc/tree-vect-slp.c:3776 > 0x1720995 vect_slp_analyze_node_operations > /gnu-work/src/gcc/gcc/tree-vect-slp.c:3776 > 0x1720995 vect_slp_analyze_node_operations > /gnu-work/src/gcc/gcc/tree-vect-slp.c:3776 > 0x1720995 vect_slp_analyze_node_operations > /gnu-work/src/gcc/gcc/tree-vect-slp.c:3776 > 0x17212f2 vect_slp_analyze_operations(vec_info*) > /gnu-work/src/gcc/gcc/tree-vect-slp.c:3971 > 0x16e2c6e vect_analyze_loop_2 > /gnu-work/src/gcc/gcc/tree-vect-loop.c:2419 > 0x16e4bcd vect_analyze_loop(loop*, vec_info_shared*) > /gnu-work/src/gcc/gcc/tree-vect-loop.c:2956 > 0x173ba4d try_vectorize_loop_1 > /gnu-work/src/gcc/gcc/tree-vectorizer.c:1010 > 0x173c23a try_vectorize_loop > /gnu-work/src/gcc/gcc/tree-vectorizer.c:1163 > 0x173c470 vectorize_loops() > /gnu-work/src/gcc/gcc/tree-vectorizer.c:1244 > 0x1579275 execute > /gnu-work/src/gcc/gcc/tree-ssa-loop.c:414 > > I will assume the patch's existence in tree when I send out my COMPLEX_ADD > patch today then. OK, will push it then - I hoped it might help you! Thanks, Richard. > Thanks!, > Tamar > > > 2020-11-20 Richard Biener <rguent...@suse.de> > > > > * tree-vect-slp.c (maybe_push_to_hybrid_worklist): New function. > > (vect_detect_hybrid_slp): Use it. Perform a backward walk > > over the IL. > > --- > > gcc/tree-vect-slp.c | 79 > > +++++++++++++++++++++++++++++++++++++++++---- > > 1 file changed, 72 insertions(+), 7 deletions(-) > > > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index > > 486ee95d5d2..f87ac3c049f 100644 > > --- a/gcc/tree-vect-slp.c > > +++ b/gcc/tree-vect-slp.c > > @@ -3439,6 +3439,63 @@ vect_detect_hybrid_slp (tree *tp, int *, void > > *data) > > return NULL_TREE; > > } > > > > +/* Look if STMT_INFO is consumed by SLP indirectly and mark it pure_slp > > + if so, otherwise pushing it to WORKLIST. */ > > + > > +static void > > +maybe_push_to_hybrid_worklist (vec_info *vinfo, > > + vec<stmt_vec_info> &worklist, > > + stmt_vec_info stmt_info) > > +{ > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Processing hybrid candidate : %G", stmt_info->stmt); > > + stmt_vec_info orig_info = vect_orig_stmt (stmt_info); > > + imm_use_iterator iter2; > > + ssa_op_iter iter1; > > + use_operand_p use_p; > > + def_operand_p def_p; > > + bool any_def = false; > > + FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_info->stmt, iter1, > > SSA_OP_DEF) > > + { > > + any_def = true; > > + FOR_EACH_IMM_USE_FAST (use_p, iter2, DEF_FROM_PTR (def_p)) > > + { > > + stmt_vec_info use_info = vinfo->lookup_stmt (USE_STMT (use_p)); > > + /* An out-of loop use means this is a loop_vect sink. */ > > + if (!use_info) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Found loop_vect sink: %G", stmt_info- > > >stmt); > > + worklist.safe_push (stmt_info); > > + return; > > + } > > + else if (!STMT_SLP_TYPE (vect_stmt_to_vectorize (use_info))) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Found loop_vect use: %G", use_info->stmt); > > + worklist.safe_push (stmt_info); > > + return; > > + } > > + } > > + } > > + /* No def means this is a loo_vect sink. */ > > + if (!any_def) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Found loop_vect sink: %G", stmt_info->stmt); > > + worklist.safe_push (stmt_info); > > + return; > > + } > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Marked SLP consumed stmt pure: %G", stmt_info->stmt); > > + STMT_SLP_TYPE (stmt_info) = pure_slp; } > > + > > /* Find stmts that must be both vectorized and SLPed. */ > > > > void > > @@ -3448,9 +3505,14 @@ vect_detect_hybrid_slp (loop_vec_info > > loop_vinfo) > > > > /* All stmts participating in SLP are marked pure_slp, all other > > stmts are loop_vect. > > - First collect all loop_vect stmts into a worklist. */ > > + First collect all loop_vect stmts into a worklist. > > + SLP patterns cause not all original scalar stmts to appear in > > + SLP_TREE_SCALAR_STMTS and thus not all of them are marked pure_slp. > > + Rectify this here and do a backward walk over the IL only considering > > + stmts as loop_vect when they are used by a loop_vect stmt and > > otherwise > > + mark them as pure_slp. */ > > auto_vec<stmt_vec_info> worklist; > > - for (unsigned i = 0; i < LOOP_VINFO_LOOP (loop_vinfo)->num_nodes; ++i) > > + for (int i = LOOP_VINFO_LOOP (loop_vinfo)->num_nodes - 1; i >= 0; > > + --i) > > { > > basic_block bb = LOOP_VINFO_BBS (loop_vinfo)[i]; > > for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); @@ > > -3459,10 > > +3521,11 @@ vect_detect_hybrid_slp (loop_vec_info loop_vinfo) > > gphi *phi = gsi.phi (); > > stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (phi); > > if (!STMT_SLP_TYPE (stmt_info) && STMT_VINFO_RELEVANT > > (stmt_info)) > > - worklist.safe_push (stmt_info); > > + maybe_push_to_hybrid_worklist (loop_vinfo, > > + worklist, stmt_info); > > } > > - for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); > > - gsi_next (&gsi)) > > + for (gimple_stmt_iterator gsi = gsi_last_bb (bb); !gsi_end_p (gsi); > > + gsi_prev (&gsi)) > > { > > gimple *stmt = gsi_stmt (gsi); > > if (is_gimple_debug (stmt)) > > @@ -3478,12 +3541,14 @@ vect_detect_hybrid_slp (loop_vec_info > > loop_vinfo) > > = loop_vinfo->lookup_stmt (gsi_stmt (gsi2)); > > if (!STMT_SLP_TYPE (patt_info) > > && STMT_VINFO_RELEVANT (patt_info)) > > - worklist.safe_push (patt_info); > > + maybe_push_to_hybrid_worklist (loop_vinfo, > > + worklist, patt_info); > > } > > stmt_info = STMT_VINFO_RELATED_STMT (stmt_info); > > } > > if (!STMT_SLP_TYPE (stmt_info) && STMT_VINFO_RELEVANT > > (stmt_info)) > > - worklist.safe_push (stmt_info); > > + maybe_push_to_hybrid_worklist (loop_vinfo, > > + worklist, stmt_info); > > } > > } > > > > -- > > 2.26.2 > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend