On Thu, 10 Dec 2020, Dinar Temirbulatov wrote:

> Hi,
> I have observed that STV2 pass added ~20% on CPU2006 456.hmmer with mostly
> by transforming V4SI operations. Looking at the pass itself, it looks like
> it might be transformed into RTL architecture-independent, and the pass
> deals only not wide integer operations. I think it might be useful on other
> targets as well?

The pass moves GPR operations to vector register operations.  While
conceptually this is something generic the implementation is quite
dependent on the actual implementation of the vector patterns,
specifically in the way it uses pardoxical subregs and of course
restricts itself to supported operations and costing.

The 456.hmmer improvement is because the x86 micro-architectures do
not seem to like back-to-back cmov (or even cmov + branch, but less so)
implementing min(min(...)) but the vector min operation is much
faster.

So I guess if there's another target lacking integer min/max operation
using GPRs but do have vector integer min/max it's easy to look
at 456.hmmer and replace the one (or was it two) important occurance
with manually crafted assembly to see if it's worth it.  Then implement
a target-local STV copy that "works".  After we have two implementations
we can see whether commonizing is possible.

Note there is/was quite some fallout because doing STV is not always
profitable and it's difficult to determine exactly when it is (not).
Because we still don't quite understand _why_ 456.hmmer is so much
faster with vector min/max compared to cmov.

Richard.

Reply via email to