On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi All,
> 
> This optimizes sequential permutes. i.e. if there are two permutes back to 
> back
> this function applies the permute of the parent to the child and removed the
> parent.
> 
> If the resulting permute in the child is now a no-op.  Then the child is also
> dropped from the graph and the parent's parent attached to the child's child.
> 
> This relies on the materialization point calculation in optimize SLP.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Tests are included as part of the final patch as they need the SLP pattern
> matcher to insert permutes in between.
> 
> This allows us to remove useless permutes such as
> 
>       ldr     q0, [x0, x3]
>       ldr     q2, [x1, x3]
>       trn1    v1.4s, v0.4s, v0.4s
>       trn2    v0.4s, v0.4s, v0.4s
>       trn1    v0.4s, v1.4s, v0.4s
>       mov     v1.16b, v3.16b
>       fcmla   v1.4s, v0.4s, v2.4s, #0
>       fcmla   v1.4s, v0.4s, v2.4s, #90
>       str     q1, [x2, x3]
> 
> from the sequence the vectorizer puts out and give
> 
>       ldr     q0, [x0, x3]
>       ldr     q2, [x1, x3]
>       mov     v1.16b, v3.16b
>       fcmla   v1.4s, v0.4s, v2.4s, #0
>       fcmla   v1.4s, v0.4s, v2.4s, #90
>       str     q1, [x2, x3]
> 
> instead
> 
> Ok for master?

+             /* If the remaining permute is a no-op then we can just drop 
the
+                node instead of materializing it.  */
+             if (vect_slp_tree_permute_noop_p (node))
+               {
+                 if (dump_enabled_p ())
+                   dump_printf_loc (MSG_NOTE, vect_location,
+                                    "removing unneeded permute node 
%p\n",
+                                    node);
+
+                  unsigned idx = SLP_TREE_LANE_PERMUTATION 
(node)[0].first;
+                  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
+                  unsigned src = slpg->vertices[node->vertex].pred->src;
+                  slp_tree prev = vertices[src];
+                  unsigned dest;
+                  slp_tree tmp;
+                  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
+                    if (tmp == node)
+                      {
+                         SLP_TREE_CHILDREN (prev)[dest] = value;
+                         break;
+                       }

so I don't think this will work reliably since we do not update
the graph when inserting permute nodes and thus the "parent"
can refer to a permute rather than the original node now (we're
just walking over all vertices in no specific order during
materialization - guess using IPO might fix this apart from in
cycles).  You would also need to iterate over preds here (pred_next).
I guess removing no-op permutes is only important for costing?
They should not cause any actual code generation?

You also need to adjust reference counts when you change
SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE
and then slp_tree_free node itself (which might be tricky
at this point).

+static bool
+vect_slp_tree_permute_noop_p (slp_tree node)
+{
+  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
+
+  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
+    return true;
+
+  unsigned x, seed;
+  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
+  seed = perms[0].second;
+  for (x = 1; x < perms.length (); x++)
+    if (perms[x].first != perms[0].first || perms[x].second != ++seed)
+      return false;

'seed' needs to be zero to be a noop permute and
SLP_TREE_LANES (SLP_TREE_CHILDREN (node)[perms[0].first]) needs
to be the same as SLP_TREE_LANES (node).  Otherwise you'll make
permutes that select parts of a vector no-op.

Maybe simplify the patch and do the vect_slp_tree_permute_noop_p
check in vectorizable_slp_permutation instead?

The permute node adjustment part is OK, thus up to

+         else if (SLP_TREE_LANE_PERMUTATION (node).exists ())
+           {
+             /* If the node if already a permute node we just need to 
apply
+                the permutation to the permute node itself.  */
+             if (dump_enabled_p ())
+               dump_printf_loc (MSG_NOTE, vect_location,
+                                "simplifying permute node %p\n",
+                                node);
+
+             vect_slp_permute (perms[perm], SLP_TREE_LANE_PERMUTATION 
(node),
+                               true);

in case you want to split up, independently of the rest of the
patches.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       * tree-vect-slp.c (vect_slp_tree_permute_noop_p): New.
>       (vect_optimize_slp): Optimize permutes.
>       (vectorizable_slp_permutation): Fix typo.
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Reply via email to