On Wed, 28 May 2025 18:39:13 GMT, Mohamed Issa <d...@openjdk.org> wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for >> java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are >> included to check the performance of specific input value ranges to help >> prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [IntelĀ® Xeon >> 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) >> using [OpenJDK >> v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the >> baseline version. >> >> For performance data collected with the new built in range micro-benchmark, >> see the table below. Each result is the mean of 8 individual runs, and the >> input ranges used match those from the original Java implementation. >> Overall, the intrinsic provides a major uplift of 169% when very small >> inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput >> (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | >> :-------------------------------: | :-------------------------------: | >> :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 >> | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 >> | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed >> with the changes. > > Mohamed Issa has updated the pull request incrementally with four additional > commits since the last revision: > > - Remove comment mentioning invalid exception when NaN input is provided > - Use rcx as base and r8 as index for address calculations in certain cbrt > stub generator instructions > - Remove unnecessary unpckhpd and unpcklpd definitions in macro-assembler > header file > - Remove unnecessary movapd definitions in macro-assembler header file Patch looks good to me, some comment included. src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 185: > 183: > 184: #define __ _masm-> > 185: Original Intel libm inline sequence uses hexadecimal constants, I would have preferred to use them as it is to maintain 1:1 mapping b/w instruction sequence. test/micro/org/openjdk/bench/java/lang/CbrtPerf.java line 56: > 54: public static class CbrtPerfRanges { > 55: public static int cbrtInputCount = 2048; > 56: Please create separate CbrtPerfSpecialValues for +/- 0.0 and +/- Infinity and NaN values. I understand that handling special cases in intrinsic may impact general case performance but its ok to have atleast micro for it. test/micro/org/openjdk/bench/java/lang/CbrtPerf.java line 114: > 112: public static final double constDouble512 = 512.0; > 113: > 114: @Benchmark Baseline:- Benchmark (cbrtRangeIndex) Mode Cnt Score Error Units CbrtPerf.CbrtPerfConstant.cbrtConstDouble0 N/A thrpt 2 2673018.356 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble1 N/A thrpt 2 2684233.593 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble27 N/A thrpt 2 2684250.835 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble512 N/A thrpt 2 2683616.321 ops/ms Withopt:- Benchmark (cbrtRangeIndex) Mode Cnt Score Error Units CbrtPerf.CbrtPerfConstant.cbrtConstDouble0 N/A thrpt 2 284575.292 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble1 N/A thrpt 2 162876.035 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble27 N/A thrpt 2 163227.835 ops/ms CbrtPerf.CbrtPerfConstant.cbrtConstDouble512 N/A thrpt 2 162998.844 ops/ms There is approximaely 10x performance improvement by disabling intrinsic for compile time constant inputs. I have created a follow up JBS to track it. https://bugs.openjdk.org/browse/JDK-8358039 ------------- PR Review: https://git.openjdk.org/jdk/pull/24470#pullrequestreview-2877492755 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113462482 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113484695 PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2113472992