On Tue, 3 Nov 2020, Tamar Christina wrote: > Hi All, > > This optimizes sequential permutes. i.e. if there are two permutes back to > back > this function applies the permute of the parent to the child and removed the > parent. > > If the resulting permute in the child is now a no-op. Then the child is also > dropped from the graph and the parent's parent attached to the child's child. > > This relies on the materialization point calculation in optimize SLP. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > Tests are included as part of the final patch as they need the SLP pattern > matcher to insert permutes in between. > > This allows us to remove useless permutes such as > > ldr q0, [x0, x3] > ldr q2, [x1, x3] > trn1 v1.4s, v0.4s, v0.4s > trn2 v0.4s, v0.4s, v0.4s > trn1 v0.4s, v1.4s, v0.4s > mov v1.16b, v3.16b > fcmla v1.4s, v0.4s, v2.4s, #0 > fcmla v1.4s, v0.4s, v2.4s, #90 > str q1, [x2, x3] > > from the sequence the vectorizer puts out and give > > ldr q0, [x0, x3] > ldr q2, [x1, x3] > mov v1.16b, v3.16b > fcmla v1.4s, v0.4s, v2.4s, #0 > fcmla v1.4s, v0.4s, v2.4s, #90 > str q1, [x2, x3] > > instead > > Ok for master?
+ /* If the remaining permute is a no-op then we can just drop the + node instead of materializing it. */ + if (vect_slp_tree_permute_noop_p (node)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "removing unneeded permute node %p\n", + node); + + unsigned idx = SLP_TREE_LANE_PERMUTATION (node)[0].first; + slp_tree value = SLP_TREE_CHILDREN (node)[idx]; + unsigned src = slpg->vertices[node->vertex].pred->src; + slp_tree prev = vertices[src]; + unsigned dest; + slp_tree tmp; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp) + if (tmp == node) + { + SLP_TREE_CHILDREN (prev)[dest] = value; + break; + } so I don't think this will work reliably since we do not update the graph when inserting permute nodes and thus the "parent" can refer to a permute rather than the original node now (we're just walking over all vertices in no specific order during materialization - guess using IPO might fix this apart from in cycles). You would also need to iterate over preds here (pred_next). I guess removing no-op permutes is only important for costing? They should not cause any actual code generation? You also need to adjust reference counts when you change SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then slp_tree_free node itself (which might be tricky at this point). +static bool +vect_slp_tree_permute_noop_p (slp_tree node) +{ + gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR); + + if (!SLP_TREE_LANE_PERMUTATION (node).exists ()) + return true; + + unsigned x, seed; + lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node); + seed = perms[0].second; + for (x = 1; x < perms.length (); x++) + if (perms[x].first != perms[0].first || perms[x].second != ++seed) + return false; 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as SLP_TREE_LANES (node). Otherwise you'll make permutes that select parts of a vector no-op. Maybe simplify the patch and do the vect_slp_tree_permute_noop_p check in vectorizable_slp_permutation instead? The permute node adjustment part is OK, thus up to + else if (SLP_TREE_LANE_PERMUTATION (node).exists ()) + { + /* If the node if already a permute node we just need to apply + the permutation to the permute node itself. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "simplifying permute node %p\n", + node); + + vect_slp_permute (perms[perm], SLP_TREE_LANE_PERMUTATION (node), + true); in case you want to split up, independently of the rest of the patches. Thanks, Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vect-slp.c (vect_slp_tree_permute_noop_p): New. > (vect_optimize_slp): Optimize permutes. > (vectorizable_slp_permutation): Fix typo. > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend