On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarreño <gal...@openjdk.org> wrote:

>> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in 
>> order to help improve vectorization performance.
>> 
>> Currently vectorization does not kick in for loops containing either of 
>> these calls because of the following error:
>> 
>> 
>> VLoop::check_preconditions: failed: control flow in loop not allowed
>> 
>> 
>> The control flow is due to the java implementation for these methods, e.g.
>> 
>> 
>> public static long max(long a, long b) {
>>     return (a >= b) ? a : b;
>> }
>> 
>> 
>> This patch intrinsifies the calls to replace the CmpL + Bool nodes for 
>> MaxL/MinL nodes respectively.
>> By doing this, vectorization no longer finds the control flow and so it can 
>> carry out the vectorization.
>> E.g.
>> 
>> 
>> SuperWord::transform_loop:
>>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt 
>> strip_mined
>>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] 
>> inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] 
>> !jvms: Test::test @ bci:14 (line 21)
>> 
>> 
>> Applying the same changes to `ReductionPerf` as in 
>> https://github.com/openjdk/jdk/pull/13056, we can compare the results before 
>> and after. Before the patch, on darwin/aarch64 (M1):
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>>                                                          1     1     0     0
>> ==============================
>> TEST SUCCESS
>> 
>> long min   1155
>> long max   1173
>> 
>> 
>> After the patch, on darwin/aarch64 (M1):
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>>                                                          1     1     0     0
>> ==============================
>> TEST SUCCESS
>> 
>> long min   1042
>> long max   1042
>> 
>> 
>> This patch does not add an platform-specific backend implementations for the 
>> MaxL/MinL nodes.
>> Therefore, it still relies on the macro expansion to transform those into 
>> CMoveL.
>> 
>> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these 
>> results:
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PA...
>
> Galder Zamarreño has updated the pull request incrementally with three 
> additional commits since the last revision:
> 
>  - Revert "Implement cmovL as a jump+mov branch"
>    
>    This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44.
>  - Revert "Switch movl to movq"
>    
>    This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7.
>  - Revert "Fix format of assembly for the movl to movq switch"
>    
>    This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047.

You've probably seen this but the new test is failing IR verification:


Failed IR Rules (4) of Methods (4)
----------------------------------
1) Method "private static double 
compiler.intrinsics.math.TestMinMaxInlining.testDoubleMax(double,double)" - 
[Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, 
applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, 
failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, 
applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, 
applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 = 1 [given]
           - No nodes matched!

2) Method "private static double 
compiler.intrinsics.math.TestMinMaxInlining.testDoubleMin(double,double)" - 
[Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, 
applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, 
failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, 
applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, 
applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 = 1 [given]
           - No nodes matched!

3) Method "private static float 
compiler.intrinsics.math.TestMinMaxInlining.testFloatMax(float,float)" - 
[Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, 
applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, 
failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, 
applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, 
applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 = 1 [given]
           - No nodes matched!

4) Method "private static float 
compiler.intrinsics.math.TestMinMaxInlining.testFloatMin(float,float)" - 
[Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, 
applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, 
failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, 
applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, 
applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 = 1 [given]
           - No nodes matched!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2382746375

Reply via email to