On 9/3/23 22:49, Lehua Ding wrote:
This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_<optab><mode>):
Add sqrt + vcond_mask combine pattern.
* config/riscv/autovec.md (<optab><mode>2):
Change define_expand to define_insn_and_split.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
OK. Thanks.
FWIW, I thought we only had the reciprocal sqrt estimator, but in fact
rvv does define a real vector sqrt. So the concerns we kicked around
in the meeting this morning turned out not be warranted.
This raises one of the very interesting questions in this space,
specifically whether or not we should be using the rsqrt estimator with
correction steps. Unless the vfsqrt latency is really bad, it's going
to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7
sequences will be pipelinable while vfsqrt almost certainly isn't.
Sadly we don't have a scalar FP rsqrt estimator. Though I certainly
ponder using the vector one -- there's a neat trick you can do with the
nab benchmark from spec and produce sqrt and rsqrt at the same time with
a Goldschmidt sequence. It requires a bit of hackery to make new tree
nodes, but it was definitely worth it on other targets I've worked on.
Jeff