On Wed, 4 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> > -----Original Message-----
> > From: rguent...@c653.arch.suse.de <rguent...@c653.arch.suse.de> On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 1:00 PM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; o...@ucw.cz
> > Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back
> > permutes.
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This optimizes sequential permutes. i.e. if there are two permutes
> > > back to back this function applies the permute of the parent to the
> > > child and removed the parent.
> > >
> > > If the resulting permute in the child is now a no-op.  Then the child
> > > is also dropped from the graph and the parent's parent attached to the
> > child's child.
> > >
> > > This relies on the materialization point calculation in optimize SLP.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > Tests are included as part of the final patch as they need the SLP
> > > pattern matcher to insert permutes in between.
> > >
> > > This allows us to remove useless permutes such as
> > >
> > >   ldr     q0, [x0, x3]
> > >   ldr     q2, [x1, x3]
> > >   trn1    v1.4s, v0.4s, v0.4s
> > >   trn2    v0.4s, v0.4s, v0.4s
> > >   trn1    v0.4s, v1.4s, v0.4s
> > >   mov     v1.16b, v3.16b
> > >   fcmla   v1.4s, v0.4s, v2.4s, #0
> > >   fcmla   v1.4s, v0.4s, v2.4s, #90
> > >   str     q1, [x2, x3]
> > >
> > > from the sequence the vectorizer puts out and give
> > >
> > >   ldr     q0, [x0, x3]
> > >   ldr     q2, [x1, x3]
> > >   mov     v1.16b, v3.16b
> > >   fcmla   v1.4s, v0.4s, v2.4s, #0
> > >   fcmla   v1.4s, v0.4s, v2.4s, #90
> > >   str     q1, [x2, x3]
> > >
> > > instead
> > >
> > > Ok for master?
> > 
> > +             /* If the remaining permute is a no-op then we can just
> > + drop
> > the
> > +                node instead of materializing it.  */
> > +             if (vect_slp_tree_permute_noop_p (node))
> > +               {
> > +                 if (dump_enabled_p ())
> > +                   dump_printf_loc (MSG_NOTE, vect_location,
> > +                                    "removing unneeded permute node
> > %p\n",
> > +                                    node);
> > +
> > +                  unsigned idx = SLP_TREE_LANE_PERMUTATION
> > (node)[0].first;
> > +                  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
> > +                  unsigned src = slpg->vertices[node->vertex].pred->src;
> > +                  slp_tree prev = vertices[src];
> > +                  unsigned dest;
> > +                  slp_tree tmp;
> > +                  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
> > +                    if (tmp == node)
> > +                      {
> > +                         SLP_TREE_CHILDREN (prev)[dest] = value;
> > +                         break;
> > +                       }
> > 
> > so I don't think this will work reliably since we do not update the graph 
> > when
> > inserting permute nodes and thus the "parent"
> > can refer to a permute rather than the original node now (we're just walking
> > over all vertices in no specific order during materialization - guess using 
> > IPO
> > might fix this apart from in cycles).  You would also need to iterate over 
> > preds
> > here (pred_next).
> > I guess removing no-op permutes is only important for costing?
> > They should not cause any actual code generation?
> 
> Yeah, it's just for costing, the simplification of the permute part is the 
> one fixing
> the codegen. I could just remove the lane permute (as in, clear it) and 
> change the
> costing function to not cost VEC_PERMS with no lane permutes (if it doesn't 
> already do that).

I think clearing the lane permute is even not necessary.  The vec
perm code generation should already not cost anything here
since it is also able to elide costs when the permute aligns
naturally with vector boundaries as in { [0, 2], [0, 3], [0, 0], [0, 1] }
for two-element vectors. 

> > 
> > You also need to adjust reference counts when you change
> > SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then
> > slp_tree_free node itself (which might be tricky at this point).
> > 
> > +static bool
> > +vect_slp_tree_permute_noop_p (slp_tree node) {
> > +  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
> > +
> > +  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
> > +    return true;
> > +
> > +  unsigned x, seed;
> > +  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
> > seed =
> > + perms[0].second;  for (x = 1; x < perms.length (); x++)
> > +    if (perms[x].first != perms[0].first || perms[x].second != ++seed)
> > +      return false;
> > 
> > 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES
> > (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as
> > SLP_TREE_LANES (node).  Otherwise you'll make permutes that select parts
> > of a vector no-op.
> > 
> > Maybe simplify the patch and do the vect_slp_tree_permute_noop_p check
> > in vectorizable_slp_permutation instead?
> > 
> > The permute node adjustment part is OK, thus up to
> > 
> > +         else if (SLP_TREE_LANE_PERMUTATION (node).exists ())
> > +           {
> > +             /* If the node if already a permute node we just need to
> > apply
> > +                the permutation to the permute node itself.  */
> > +             if (dump_enabled_p ())
> > +               dump_printf_loc (MSG_NOTE, vect_location,
> > +                                "simplifying permute node %p\n",
> > +                                node);
> > +
> > +             vect_slp_permute (perms[perm], SLP_TREE_LANE_PERMUTATION
> > (node),
> > +                               true);
> > 
> > in case you want to split up, independently of the rest of the patches.
> > 
> 
> I could but don't think I can generate a testcase for this as the build 
> function wouldn't
> have generated back to back permutes.  So I'd still need the rest to get 
> approved for a
> testcase ?

You can add a testcase once you have one, the above is quite obvious
IMHO.

Richard.

> Thanks,
> Tamar
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-slp.c (vect_slp_tree_permute_noop_p): New.
> > >   (vect_optimize_slp): Optimize permutes.
> > >   (vectorizable_slp_permutation): Fix typo.
> > >
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Reply via email to