Re: [PATCH 1/2][GCC][RFC][middle-end]: Add SLP pattern matcher.

Richard Biener Mon, 07 Oct 2019 04:16:27 -0700

On Tue, 1 Oct 2019, Tamar Christina wrote:

> Hi All,
> 
> This adds a framework to allow pattern matchers to be written at based on the
> SLP tree.  The difference between this one and the one in tree-vect-patterns 
> is
> that this matcher allows the matching of an arbitrary number of parallel
> statements and replacing of an arbitrary number of children or statements.
> 
> Any relationship created by the SLP pattern matcher will be undone if SLP 
> fails.
> 
> The pattern matcher can also cancel all permutes depending on what the pattern
> requested it to do.  As soon as one pattern requests the permutes to be
> cancelled all permutes are cancelled.
> 
> Compared to the previous pattern matcher this one will work for an arbitrary
> group size and will match at any arbitrary node in the SLP tree.  The only
> requirement is that the entire node is matched or rejected.
> 
> vect_build_slp_tree_1 is a bit more lenient in what it accepts as "compatible
> operations" but the matcher cannot be because in cases where you match the 
> order
> of the operands may be changed.  So all operands must be changed or none.
> 
> Furthermore the matcher relies on canonization of the operations inside the
> SLP tree and on the fact that floating math operations are not commutative.
> This means that matching a pattern does not need to try all alternatives or
> combinations and that arguments will always be in the same order if it's the
> same operation.
> 
> The pattern matcher also ignored uninteresting nodes such as type casts, loads
> and stores.  Doing so is essential to keep the runtime down.
> 
> Each matcher is allowed a post condition that can be run to perform any 
> changes
> to the SLP tree as needed before the patterns are created and may also abort
> the creation of the patterns.
> 
> When a pattern is matched it is not immediately created but instead it is
> deferred until all statements in the node have been analyzed.  Only if all 
> post
> conditions are true, and all statements will be replaced will the patterns be
> created in batch.  This allows us to not have to undo any work if the pattern
> fails but also makes it so we traverse the tree only once.
> 
> When a new pattern is created it is a marked as a pattern to the statement it 
> is
> replacing and be marked as used in the current SLP scope.  If SLP fails then
> relationship is undone and the relevancy restored.
> 
> Each pattern matcher can detect any number of pattern it wants.  The only
> constraint is that the optabs they produce must all have the same arity.
> 
> The pattern matcher supports instructions that have no scalar form as they
> are added as pattern statements to the stmt.  The BB is left untouched and
> so the scalar loop is untouched.
> 
> Bootstrapped on aarch64-none-linux-gnu and no issues.
> No regression testing done yet.


If you split out the introduction of SLP_TREE_REF_COUNT you can commit
that right now (sorry for being too lazy there...).

One overall comment - you do pattern matching after SLP tree
creation (good) but still do it before the whole SLP graph is
created (bad).  Would it be possible to instead do it as a separate
phase in vect_analyze_slp, looping over all instances (the instances
present entries into the single unified SLP graph now), avoiding
to visit "duplicates"?

In patch 1/2 I see references (mostly in variable names) to
"complex" which is IMHO too specific.

I'd also think that a useful first pattern to support would be
what we do with SLP_TREE_TWO_OPERATORS, where code generation
inserts extra blends.  I have yet to dive into the complex pattern
details to see if that's feasible or if you maybe re-use that...
You seem to at least rely on the SLP build succeeding with
the mixed plus/minus cases?  Which also restricts the kind of
patterns we can recognize I guess.  Plus it shows the chicken-and-egg
issue here - we'd like to detect the patterns _first_ and then
build the SLP trees (or rather combine lanes into larger groups).
Doing it after the fact makes it no more powerful than doing
it as folding post vectorization?

+typedef hash_map <stmt_vec_info, slp_tree>
+  ssa_name_def_to_slp_tree_map_t;
+

this seems to be unused.

+  FOR_EACH_VEC_ELT_FROM (scalar_stmts, i, stmt_info, n - 1)
+    {
...
+      work.stmt_infos = scalar_stmts.begin () + (i - (n - 1));
+      work.idx = i;

so this tries to match patterns on ARITY arbitrary aligned
but adjacent scalar stmts [in a brute force way].  But then
you immediately fail when one matching fails.  So I think
this loop should be written like

 for (unsigned i = n - 1; i < scalar_stmts.length (); i += n)
   {
...

to make this fact clearer.  A missed optimization here is to consider
pre/post shuffling of the group to make more cases match.

In this loop you also do the target support check which could
possibly be done upfront in the pattern matcher itself to save
compile-time?  It also seems to be required that patterns match
a single IFN call?

Looking at the complex patterns I am worried and confused about
the transform phase - just quoting/commenting on random pieces:

+  FOR_EACH_VEC_ELT (scalar_stmts, i, scalar_stmt)
+    {
+      if (defs.contains (scalar_stmt))
+       {

this is quadratic - vec::contains does a linear search.

         arg_map->put (scalar_stmt, vect_split_slp_tree (node, i));
+         found_p = true;

it seems that you are re-doing the match here, something that
should have been done in the first phase of pattern matching already.
May I suggest to restructure the pattern matchers in a way that you
have a

 class slp_pattern
 {
   virtual match() = 0;
   virtual transform() = 0;
 };

and derive from that so you can have a pattern specific state you
can transfer from match to transform?  Iteration over patterns
then either becomes ad-hoc or you find a way to iterate over
an "array of types" with our C++04 features creating a new
instance when you match to hold this state?

I also wonder what you are doing here - shouldn't it be "simply"
replacing a subgraph of the SLP tree with a single new SLP
node that now has those IFNs as "scalar" stmts (they are not
really scalars anymore because of that arity issue).  This also
means that code-generation might better not go the "traditional"
way but instead we use a new vect_transform_slp_pattern function
which does the natural thing and we'll just have the pattern
IFN recorded directly in the slp_tree structure?  (I realize
the complication of vect_get_slp_defs using the scalar stmts to
identify vectorized operands defs)

That said, I still think the same result can be achieved by
post-vectorizer pattern matching.  I also think that
doing pattern matching after the SLP tree build is backwards.
My vision is that we'd do more general graph matching on
the SSA graph forming the SLP tree rather than the current
ad-hoc matching starting from special "sinks".

Thanks,
Richard.


> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 2019-10-01  Tamar Christina  <tamar.christ...@arm.com>
> 
>       * tree-vect-loop.c (vect_dissolve_slp_only_patterns): New.
>       (vect_dissolve_slp_only_groups): Use macro.
>       * tree-vect-patterns.c (vect_mark_pattern_stmts): Expose symbol.
>       * tree-vect-slp.c (vect_free_slp_tree): Add control of recursion and how
>       to free.
>       (ssa_name_def_to_slp_tree_map_t): New.
>       (vect_create_new_slp_node, vect_build_slp_tree): Use macro.
>       (vect_create_slp_patt_stmt): New.
>       (vect_match_slp_patterns_2): New.
>       (vect_match_slp_patterns): New.
>       (vect_analyze_slp_instance): Call vect_match_slp_patterns and undo
>       permutes.
>       (vect_detect_hybrid_slp_stmts): Dissolve relationships created for SLP.
>       * tree-vectorizer.h (SLP_TREE_REF_COUNT): New.
>       (vect_mark_pattern_stmts): New.
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH 1/2][GCC][RFC][middle-end]: Add SLP pattern matcher.

Reply via email to