On Mon, 28 Sep 2020, Tamar Christina wrote: > > > > -----Original Message----- > > From: Gcc-patches <gcc-patches-boun...@gcc.gnu.org> On Behalf Of Tamar > > Christina > > Sent: Monday, September 28, 2020 3:56 PM > > To: Richard Biener <rguent...@suse.de> > > Cc: nd <n...@arm.com>; gcc-patches@gcc.gnu.org; o...@ucw.cz > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching > > scaffolding. > > > > Hi Richi, > > > > Thanks for the review! > > > > Just some answers to your questions: > > > > > -----Original Message----- > > > From: rguent...@c653.arch.suse.de <rguent...@c653.arch.suse.de> On > > > Behalf Of Richard Biener > > > Sent: Monday, September 28, 2020 1:37 PM > > > To: Tamar Christina <tamar.christ...@arm.com> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; o...@ucw.cz > > > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching > > > scaffolding. > > > > > > On Fri, 25 Sep 2020, Tamar Christina wrote: > > > > > > > Hi All, > > > > > > > > This patch adds the basic infrastructure for doing pattern matching > > > > on SLP > > > trees. > > > > This is done immediately after the SLP tree creation because it can > > > > change the shape of the tree in radical ways and so we would like to > > > > do it before any analysis is performed on the tree. > > > > > > > > A new file tree-vect-slp-patterns.c is added which contains all the > > > > code for pattern matching on SLP trees. > > > > > > > > This cover letter is short because the changes are heavily commented. > > > > > > > > All pattern matchers need to implement the abstract type > > > VectPatternMatch. > > > > The VectSimplePatternMatch abstract class provides some default > > > > functionality for pattern matchers that need to rebuild nodes. > > > > > > > > The pattern matcher requires if replacing a statement in a node, > > > > that ALL statements be replaced. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > > > Ok for master? > > > > > > + gcall *build () > > > + { > > > + stmt_vec_info stmt_info; > > > + > > > > > > please define functions out-of-line (apart from the 1-liners) > > > > > > + /* We have to explicitly mark the old statement as unused > > > + because > > > during > > > + statement analysis the original and new pattern statement may > > > require > > > + different level of unrolling. As an example add/sub when > > > vectorized > > > + without a pattern requires 4 copies, whereas with a > > > + COMPLEX_ADD > > > pattern > > > + this only requires 2 copies and the two statement will be > > > + treated > > > as > > > + hand unrolled. That means that the analysis won't happen as > > > it'll find > > > + a mismatch. So we don't analyze the old statement and if we > > > + end > > > up > > > + needing it, e.g. SLP fails then we have to quickly re-analyze it. > > > */ > > > + STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope; > > > + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true; > > > + STMT_VINFO_RELATED_STMT (call_stmt_info) = stmt_info; > > > > > > so this means all uses have to be inside the pattern as otherwise > > > there may be even non-SLP uses. vect_mark_pattern_stmts supports > > > detecting patterns of patterns, I suppose the two-phase analysis for > > > SLP patterns does not support this right now? > > > > > > + SLP_TREE_CODE (this->m_node) = gimple_expr_code (call_stmt);; > > > > > > double ;, just make it CALL_EXPR literally (or leave it ERROR_MARK) > > > > > > You seem to do in-place changing of the SLP node you match off? > > > > Yes since this would allow me to change the root node as well, though > > thinking about it I can probably do it by passing it as a reference which > > then > > would allow me to re-use vect_create_new_slp_node which is probably > > preferable. > > > > > > > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info *vinfo, > > > &tree_size, bst_map); > > > if (node != NULL) > > > { > > > + /* Temporarily allow add_stmt calls again. */ > > > + vinfo->stmt_vec_info_ro = false; > > > + > > > + /* See if any patterns can be found in the constructed SLP tree > > > + before we do any analysis on it. */ > > > + vect_match_slp_patterns (node, vinfo, group_size, &max_nunits, > > > + matches, &npermutes, &tree_size, > > > + bst_map); > > > + > > > + /* After this no more add_stmt calls are allowed. */ > > > + vinfo->stmt_vec_info_ro = true; > > > + > > > > > > I think this is a bit early to match patterns - I'd defer it to the > > > point where all entries into the same SLP subgraph are analyzed, thus > > > somewhere at the end of vect_analyze_slp loop over all instances and > > > match patterns? That way phases are more clearly separated. > > > > That would probably work, my only worry is that the SLP analysis itself may > > fail and bail out at > > > > /* If the loads and stores can be handled with load/store-lane > > instructions do not generate this SLP instance. */ > > if (is_a <loop_vec_info> (vinfo) > > && loads_permuted > > && dr && vect_store_lanes_supported (vectype, group_size, > > false))
Ah, that piece of code. Yeah, I'm repeatedly running into it as well - it's a bad hack that stands in the way all the time :/ I guess we should try moving this upward like to vect_analyze_loop_2 right before /* Check the SLP opportunities in the loop, analyze and build SLP trees. */ ok = vect_analyze_slp (loop_vinfo, *n_stmts); if (!ok) return ok; and there check whether all grouped loads and stores can be handled with store-/load-lanes (and there are no SLP reduction chains?) in which case do not try to attempt SLP at all. Because the testcases this check was supposed to change were all-load/store-lane or all SLP so the mixed case is probably not worth special casing. Since load-/store-lanes is an arm speciality I tried to only touch this fragile part with a ten-foot pole ;) CCing Richard, if he acks the above I can produce a patch. > > Which in the initial tree may be true, but in the patterned tree may not be. > > In the previous revision of the patch you had suggested I return a boolean > > which can be used to cancel such checks. Would that be the preferred > > approach? > > > > > > Note that fiddling with vinfo->stmt_vec_info_ro is a bit ugly, maybe > > > add a - > > > >add_pattern_stmt (gimple *pattern_stmt, stmt_vec_info > > > orig_stmt) variant that also sets STMT_VINFO_RELATED_STMT but doesn't > > > check !stmt_vec_info_ro. That could be used from tree-vect-patterns.c > > > as well and we could set stmt_vec_info_ro earlier. > > > > > > + VectPattern *pattern = patt_fn (node, vinfo); uint8_t n = > > > + pattern->get_arity (); > > > + > > > + if (group_size % n != 0) > > > + { > > > + delete pattern; > > > > > > seems to require VectPattern allocation even upon failure, I suggest > > > to return NULL then to avoid excessive allocations. > > > > > > + if (!pattern->matches (stmt_infos, i)) > > > + { > > > + /* We can only do replacements for entire groups, we must > > > replace all > > > + statements in a node as the argument list/children may > > > + not > > > have > > > + equal height then. Operations that don't rewrite the > > > arguments > > > + may be safe to do, so perhaps paramatrise it. */ > > > + > > > + found_p = false; > > > > > > I find it a bit ugly to iterate over "unrolls" in the machinery rather > > > than the individual pattern matcher which might have an easier and in > > > particular cheaper job here. Since you require > > > all lanes to match the same pattern anyway. Not sure if your > > > later patches support say, mixing complex add with different rotate in > > > the same SLP node. > > > > It does, as the constraint only applies to one pattern matcher class > > handling > > the entire node. > > > > An example of such case is > > > > node 0x531a1f0 (max_nunits=2, refcnt=2) > > stmt 0 *_9 = _10; > > stmt 1 *_15 = _16; > > stmt 2 *_25 = _26; > > stmt 3 *_31 = _32; > > children 0x531a980 > > node 0x531a980 (max_nunits=2, refcnt=2) > > stmt 0 slp_patt_112 = .COMPLEX_ADD_ROT90 (_4, _14); stmt 1 slp_patt_111 > > = .COMPLEX_ADD_ROT90 (_12, _8); stmt 2 slp_patt_110 > > = .COMPLEX_ADD_ROT270 (_20, _30); stmt 3 slp_patt_109 > > = .COMPLEX_ADD_ROT270 (_28, _24); lane permutation { 0[0] 1[1] 1[2] 0[3] } > > children 0x5310680 0x530e040 node 0x5310680 (max_nunits=2, refcnt=4) > > stmt 0 _4 = *_3; stmt 1 _12 = *_11; stmt 2 _20 = *_19; stmt 3 _28 = *_27; > > load permutation { 0 1 2 3 } node 0x530e040 (max_nunits=2, refcnt=2) stmt 0 > > _14 = *_13; stmt 1 _8 = *_7; stmt 2 _30 = *_29; stmt 3 _24 = *_23; load > > permutation { 0 1 2 3 } > > > > though looking at the resulting assembly the code is incorrect, > > > > .L6: > > ldr q1, [x1, x3] > > ldr q0, [x0, x3] > > fcadd v0.2d, v0.2d, v1.2d, #270 > > str q0, [x2, x3] > > ldr q1, [x5, x3] > > ldr q0, [x6, x3] > > fcadd v0.2d, v0.2d, v1.2d, #270 > > str q0, [x4, x3] > > add x3, x3, 32 > > cmp x3, 1600 > > bne .L6 > > ret > > > > Which I assume is because SLP_TREE_REPRESENTATIVE is pointing to the > > rotate 270? Or maybe due to the lane permutations. > > > Note the ultimate idea in the end is that a SLP node can, of course, > > > be split into two [but at this point the vector type / unroll factor > > > is not final so general splitting at vector boundary is not desired yet]. > > > The split can be undone for consumers by inserting a VEC_PERM node > > > (which should semantically be a concat + select) > > > > > > + tree type = gimple_expr_type (STMT_VINFO_STMT (stmt_info)); > > > + tree vectype = get_vectype_for_scalar_type (vinfo, type, node); > > > > > > use > > > > > > tree vectype = SLP_TREE_VECTYPE (node); > > > > > > generally avoid looking at scalar stmts, iff then look at > > > SLP_TREE_REPRESENTATIVE - all lanes have uniform operations applied to > > > (but the scalar stmts may not appear to do so! the scalar stmts > > > merely stand for their 'def'). > > > > > > + /* Perform recursive matching, it's important to do this after > > > + matching > > > things > > > + in the current node as the matches here may re-order the nodes > > > + below > > > it. > > > + As such the pattern that needs to be subsequently match may change. > > > */ > > > + > > > + if (SLP_TREE_CHILDREN (node).exists ()) { > > > + slp_tree child; > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > > + found_rec_p |= vect_match_slp_patterns_2 (child, vinfo, group_size, > > > + patt_fn, max_nunits, > > > matches, > > > + npermutes, tree_size, > > > bst_map); > > > + } > > > > > > > > > you definitely need a visited set - you are walking a graph and nodes > > > can appear along multiple paths! > > > > > > + vect_mark_slp_stmts_relevant (node); > > > > > > that walks the whole subgraph but if you need to do anything you at > > > most want to touch the node itself, no? > > > > > > To make patterns-of-patterns viable you need to do all parts of the > > > walk in post-order. What breaks if you do ->matches/->validate in > > > post-order? I think that would be more future-proof. > > > > You lose the ability to match the longest pattern. As an example the complex > > add and complex fma patterns overlap. Right now I can try matching the fma > > first and then add. > > But doing it in post order the fma woud never match as the subtree would be > > too small and the add would always match. > > > > Oops, I forgot that this new version tries the same pattern over the > entire tree. So it should work. You would only lose the ability to > navigate by SSA name, but it doesn't need that anyway.. > > I'll make that change and see. Thanks, Richard. > Thanks, > Tamar > > > Aside from that it makes it very difficult to rebuild the subtrees as the > > SSA > > names have changed (since build Is already done in post order), So right now > > I can use e.g. _3, _4 etc, however if the patterns have already been > > applied I > > would need to know what their replacements are since build () would > > replace them and you lose the ability to navigate by SSA name. > > > > Regards, > > Tamar > > > > > > > > Otherwise this looks like an OK overall design. > > > > > > Thanks for working on it! > > > > > > Richard. > > > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > * Makefile.in (tree-vect-slp-patterns.o): New. > > > > * doc/passes.texi: Update documentation. > > > > * tree-vect-slp.c (vect_match_slp_patterns_2, > > > vect_match_slp_patterns): > > > > New. > > > > (vect_analyze_slp_instance): Call pattern matcher. > > > > * tree-vectorizer.h (class VectPatternMatch, class > > > > VectPattern): New. > > > > * tree-vect-slp-patterns.c: New file. > > > > > > > > > > > > > > -- > > > Richard Biener <rguent...@suse.de> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > Nuernberg, Germany; GF: Felix Imend > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend