Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On 8/25/2022 7:07 AM, Richard Sandiford via Gcc-patches wrote: >> Currently SLP tries to force permute operations "down" the graph >> from loads in the hope of reducing the total number of permutations >> needed or (in the best case) removing the need for the permutations >> entirely. This patch tries to extend it as follows: >> >> - Allow loads to take a different permutation from the one they >> started with, rather than choosing between "original permutation" >> and "no permutation". >> >> - Allow changes in both directions, if the target supports the >> reverse permutation. >> >> - Treat the placement of permutations as a two-way dataflow problem: >> after propagating information from leaves to roots (as now), propagate >> information back up the graph. >> >> - Take execution frequency into account when optimising for speed, >> so that (for example) permutations inside loops have a higher >> cost than permutations outside loops. >> >> - Try to reduce the total number of permutations when optimising for >> size, even if that increases the number of permutations on a given >> execution path. >> >> See the big block comment above vect_optimize_slp_pass for >> a detailed description. >> >> The original motivation for doing this was to add a framework that would >> allow other layout differences in future. The two main ones are: >> >> - Make it easier to represent predicated operations, including >> predicated operations with gaps. E.g.: >> >> a[0] += 1; >> a[1] += 1; >> a[3] += 1; >> >> could be a single load/add/store for SVE. We could handle this >> by representing a layout such as { 0, 1, _, 2 } or { 0, 1, _, 3 } >> (depending on what's being counted). We might need to move >> elements between lanes at various points, like with permutes. >> >> (This would first mean adding support for stores with gaps.) >> >> - Make it easier to switch between an even/odd and unpermuted layout >> when switching between wide and narrow elements. E.g. if a widening >> operation produces an even vector and an odd vector, we should try >> to keep operations on the wide elements in that order rather than >> force them to be permuted back "in order". > Very cool. Richi and I discussed this a bit a year or so ago -- > basically noting that bi-directional movement is really the way to go > and that the worst thing to do is push a permute down into the *middle* > of a computation chain since that will tend to break FMA generation. > Moving to the loads or stores or to another permute point ought to be > the goal.
Hmm, I hadn't thought specifically about the case of permutes ending up in the middle of a fusable operation. The series doesn't address that directly. If we have: permute(a) * permute(b) + c then the tendency will still be to convert that into: permute(a * b) + c Damn. Another case to think about ;-) I've pushed the series for now though (thanks to Richi for the reviews). Richard