Hi Richi, > -----Original Message----- > From: rguent...@c653.arch.suse.de <rguent...@c653.arch.suse.de> On > Behalf Of Richard Biener > Sent: Wednesday, November 4, 2020 1:00 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; o...@ucw.cz > Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back > permutes. > > On Tue, 3 Nov 2020, Tamar Christina wrote: > > > Hi All, > > > > This optimizes sequential permutes. i.e. if there are two permutes > > back to back this function applies the permute of the parent to the > > child and removed the parent. > > > > If the resulting permute in the child is now a no-op. Then the child > > is also dropped from the graph and the parent's parent attached to the > child's child. > > > > This relies on the materialization point calculation in optimize SLP. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Tests are included as part of the final patch as they need the SLP > > pattern matcher to insert permutes in between. > > > > This allows us to remove useless permutes such as > > > > ldr q0, [x0, x3] > > ldr q2, [x1, x3] > > trn1 v1.4s, v0.4s, v0.4s > > trn2 v0.4s, v0.4s, v0.4s > > trn1 v0.4s, v1.4s, v0.4s > > mov v1.16b, v3.16b > > fcmla v1.4s, v0.4s, v2.4s, #0 > > fcmla v1.4s, v0.4s, v2.4s, #90 > > str q1, [x2, x3] > > > > from the sequence the vectorizer puts out and give > > > > ldr q0, [x0, x3] > > ldr q2, [x1, x3] > > mov v1.16b, v3.16b > > fcmla v1.4s, v0.4s, v2.4s, #0 > > fcmla v1.4s, v0.4s, v2.4s, #90 > > str q1, [x2, x3] > > > > instead > > > > Ok for master? > > + /* If the remaining permute is a no-op then we can just > + drop > the > + node instead of materializing it. */ > + if (vect_slp_tree_permute_noop_p (node)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "removing unneeded permute node > %p\n", > + node); > + > + unsigned idx = SLP_TREE_LANE_PERMUTATION > (node)[0].first; > + slp_tree value = SLP_TREE_CHILDREN (node)[idx]; > + unsigned src = slpg->vertices[node->vertex].pred->src; > + slp_tree prev = vertices[src]; > + unsigned dest; > + slp_tree tmp; > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp) > + if (tmp == node) > + { > + SLP_TREE_CHILDREN (prev)[dest] = value; > + break; > + } > > so I don't think this will work reliably since we do not update the graph when > inserting permute nodes and thus the "parent" > can refer to a permute rather than the original node now (we're just walking > over all vertices in no specific order during materialization - guess using > IPO > might fix this apart from in cycles). You would also need to iterate over > preds > here (pred_next). > I guess removing no-op permutes is only important for costing? > They should not cause any actual code generation?
Yeah, it's just for costing, the simplification of the permute part is the one fixing the codegen. I could just remove the lane permute (as in, clear it) and change the costing function to not cost VEC_PERMS with no lane permutes (if it doesn't already do that). > > You also need to adjust reference counts when you change > SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then > slp_tree_free node itself (which might be tricky at this point). > > +static bool > +vect_slp_tree_permute_noop_p (slp_tree node) { > + gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR); > + > + if (!SLP_TREE_LANE_PERMUTATION (node).exists ()) > + return true; > + > + unsigned x, seed; > + lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node); > seed = > + perms[0].second; for (x = 1; x < perms.length (); x++) > + if (perms[x].first != perms[0].first || perms[x].second != ++seed) > + return false; > > 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES > (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as > SLP_TREE_LANES (node). Otherwise you'll make permutes that select parts > of a vector no-op. > > Maybe simplify the patch and do the vect_slp_tree_permute_noop_p check > in vectorizable_slp_permutation instead? > > The permute node adjustment part is OK, thus up to > > + else if (SLP_TREE_LANE_PERMUTATION (node).exists ()) > + { > + /* If the node if already a permute node we just need to > apply > + the permutation to the permute node itself. */ > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "simplifying permute node %p\n", > + node); > + > + vect_slp_permute (perms[perm], SLP_TREE_LANE_PERMUTATION > (node), > + true); > > in case you want to split up, independently of the rest of the patches. > I could but don't think I can generate a testcase for this as the build function wouldn't have generated back to back permutes. So I'd still need the rest to get approved for a testcase 😊 Thanks, Tamar > Thanks, > Richard. > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * tree-vect-slp.c (vect_slp_tree_permute_noop_p): New. > > (vect_optimize_slp): Optimize permutes. > > (vectorizable_slp_permutation): Fix typo. > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, Germany; GF: Felix Imend