On 6/28/24 6:53 PM, Vineet Gupta wrote:
Currently isfinite and isnormal use float compare instructions with fp
flags save/restored around them. Our perf team complained this could be
costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to
do FP compares w/o disturbing FP exception flags.

Coincidently, upstream ijust few days back got support for the
corresponding optabs. All that is needed is to wire these up in the
backend.

I was also hoping to get __builtin_inf() done but unforutnately it
requires little more rtl foo/bar to implement a tri-modal return.

Currently going thru CI testing.

gcc/ChangeLog:
        * config/riscv/riscv.md: Add UNSPEC_FCLASS, UNSPEC_ISFINITE,
        USPEC_ISNORMAL.
        define_insn for fclass.
        define_expand for isfinite and isnormal.

gcc/testsuite/ChangeLog:
        * gcc.target/riscv/fclass.c: New test.



+;; fclass instruction output bitmap
+;;   0 negative infinity
+;;   1 negative normal number.
+;;   2 negative subnormal number.
+;;   3 -0
+;;   4 +0
+;;   5 positive subnormal number.
+;;   6 positive normal number.
+;;   7 positive infinity
+;;   8 signaling NaN.
+;;   9 quiet NaN
+(define_insn "fclass<ANYF:mode>"
+  [(set (match_operand:SI              0 "register_operand" "=r")
+       (unspec:SI [(match_operand:ANYF 1 "register_operand" " f")]
+                  UNSPEC_FCLASS))]
+  "TARGET_HARD_FLOAT"
+  "fclass.<fmt>\t%0,%1"
+  [(set_attr "type" "fcmp")
+   (set_attr "mode" "<UNITMODE>")])
So I realize the result only has 10 bits of output, but I think would it make more sense to use X rather than SI for the result. When we use SImode on rv64 we have to deal with potential extensions. In this case we know the values are properly extended, so we could just claim it's DImode and I think everything would "just work" and we wouldn't have to worry about unnecessary sign extensions creeping in.



+
+;; TODO: isinf is a bit tricky as it require trimodal return
+;;  1 if 0x80, -1 if 0x1, 0 otherwise
It shouldn't be terrible, but it's not trivial either.

bext t0, a0, 0
neg t0
bext t1, a0, 7
czero.nez res, t0, t1
snez t1, t1
add a0, a1, a0

Or something reasonably close to that.

Of course that depends on zicond and zbs. So we probably want the expansion to not depend on those extensions, but generate code that is easily recognized and converted into that kind of a sequence.

Jeff

Reply via email to