On Fri, 13 Oct 2023 15:36:56 GMT, Ilya Gavrilin <igavri...@openjdk.org> wrote:
> Hi all, please review this changes into risc-v floating point copysign and > signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have > `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the > sign of input value returned. On risc-v we can use `fclass.x` to specify > type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ± 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ± 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ± 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ± 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions > -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ± 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ± 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ± 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ± 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and > `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we > have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, > diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java > b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + > Math.copySign(double81, double2) + Math.copySign(double4Dot1, > doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + > Math.copySign(eFloat, float1) + Math.copySign... Some comments after a brief look. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659: > 1657: // on input we have NaN or +/-0.0 value we should return it, > 1658: // otherwise return +/- 1.0 using sign of input. > 1659: // tmp1 - used to store result of fclass operation, Seems to me that scratch register `t0` could be used in this function in order to save use of `tmp1`. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1661: > 1659: // tmp1 - used to store result of fclass operation, > 1660: // one - gives us a floating-point 1.0 (got from matching rule) > 1661: // bool single_precision - specififes single or double precision > operations will be used. Suggestion: s/bool single_precision - specififes/bool is_double - specifies/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663: > 1661: // bool single_precision - specififes single or double precision > operations will be used. > 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, > FloatRegister one, Register tmp1, bool is_double) { > 1663: assert_different_registers(dst, src, one); This constraint could be relaxed if we use scratch register `t0` here instead of `tmp`. src/hotspot/cpu/riscv/riscv.ad line 7511: > 7509: // Copysign and signum intrinsics > 7510: > 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{ A more simpler `immD zero` will do here which is similar with the x86 counterpart. src/hotspot/cpu/riscv/riscv.ad line 7513: > 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{ > 7512: match(Set dst (CopySignD src1 (Binary src2 zero))); > 7513: effect(TEMP_DEF dst, USE src1, USE src2); Unnecessary effect. src/hotspot/cpu/riscv/riscv.ad line 7526: > 7524: instruct copySignF_reg(fRegF dst, fRegF src1, fRegF src2) %{ > 7525: match(Set dst (CopySignF src1 src2)); > 7526: effect(TEMP_DEF dst, USE src1, USE src2); Unnecessary effect. src/hotspot/cpu/riscv/riscv.ad line 7537: > 7535: %} > 7536: > 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, > iRegINoSp tmp1) %{ Use `immD zero` instread, which will help avoid reserving one FP register here. src/hotspot/cpu/riscv/riscv.ad line 7539: > 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, > iRegINoSp tmp1) %{ > 7538: match(Set dst (SignumD src (Binary zero one))); > 7539: effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1); Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`. src/hotspot/cpu/riscv/riscv.ad line 7548: > 7546: %} > 7547: > 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, > iRegINoSp tmp1) %{ Use `immF zero` instread, which will help avoid reserving one FP register here. src/hotspot/cpu/riscv/riscv.ad line 7550: > 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, > iRegINoSp tmp1) %{ > 7549: match(Set dst (SignumF src (Binary zero one))); > 7550: effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1); Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`. ------------- PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1679272664 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360216914 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360215721 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360219620 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360221019 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230241 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230310 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360222911 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231462 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360223205 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231642