On Thu, 10 Dec 2020, Dinar Temirbulatov wrote: > Hi, > I have observed that STV2 pass added ~20% on CPU2006 456.hmmer with mostly > by transforming V4SI operations. Looking at the pass itself, it looks like > it might be transformed into RTL architecture-independent, and the pass > deals only not wide integer operations. I think it might be useful on other > targets as well?
The pass moves GPR operations to vector register operations. While conceptually this is something generic the implementation is quite dependent on the actual implementation of the vector patterns, specifically in the way it uses pardoxical subregs and of course restricts itself to supported operations and costing. The 456.hmmer improvement is because the x86 micro-architectures do not seem to like back-to-back cmov (or even cmov + branch, but less so) implementing min(min(...)) but the vector min operation is much faster. So I guess if there's another target lacking integer min/max operation using GPRs but do have vector integer min/max it's easy to look at 456.hmmer and replace the one (or was it two) important occurance with manually crafted assembly to see if it's worth it. Then implement a target-local STV copy that "works". After we have two implementations we can see whether commonizing is possible. Note there is/was quite some fallout because doing STV is not always profitable and it's difficult to determine exactly when it is (not). Because we still don't quite understand _why_ 456.hmmer is so much faster with vector min/max compared to cmov. Richard.