On Thu, 17 Oct 2024 10:10:56 GMT, Galder Zamarreño <gal...@openjdk.org> wrote:
>> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in >> order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of >> these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for >> MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can >> carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt >> strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] >> inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] >> !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in >> https://github.com/openjdk/jdk/pull/13056, we can compare the results before >> and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the >> MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into >> CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these >> results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarreño has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 30 additional > commits since the last revision: > > - Use same default size as in other vector reduction benchmarks > - Renamed benchmark class > - Double/Float tests only when avx enabled > - Make state class non-final > - Restore previous benchmark iterations and default param size > - Add clipping range benchmark that uses min/max > - Encapsulate benchmark state within an inner class > - Avoid creating result array in benchmark method > - Merge branch 'master' into topic.intrinsify-max-min-long > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - ... and 20 more: https://git.openjdk.org/jdk/compare/20c7c9a0...0a8718e1 @galderz Thanks for taking this task on! Had a quick look at it. So auto-vectorization in SuperWord should now be working, right? If yes: It would be nice if you tested both for `IRNode.MIN_VL` and `IRNode.MIN_REDUCTION_V`, the same for max. You may want to look at these existing tests, to see what other tests there are for the `int` version: `test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Int.java` `test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java` `test/hotspot/jtreg/compiler/vectorization/TestAutoVecIntMinMax.java` `test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java` There may be some duplicates already here... not sure. And maybe you need to check what to do about probabilities as well. src/hotspot/share/opto/library_call.cpp line 1952: > 1950: Node *a = nullptr; > 1951: Node *b = nullptr; > 1952: Node *n = nullptr; If you are touching this, then you might as well fix the style. Suggestion: Node* a = nullptr; Node* b = nullptr; Node* n = nullptr; test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxInlining.java line 80: > 78: @IR(phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { > IRNode.MIN_L, "1" }) > 79: @IR(phase = { CompilePhase.AFTER_MACRO_EXPANSION }, counts = { > IRNode.MIN_L, "0" }) > 80: private static long testLongMin(long a, long b) { Can you add a comment why it disappears after macro expansion? test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxInlining.java line 108: > 106: @Test > 107: @Arguments(values = { Argument.NUMBER_MINUS_42, Argument.NUMBER_42 }) > 108: @IR(counts = { IRNode.MIN_F, "1" }, applyIfCPUFeatureOr = {"avx", > "true"}) Is this not supported by `asimd`? Same question for the other cases. ------------- PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2391518909 PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1814398129 PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1863381007 PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1863381913