> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>     => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>     => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark                     Unit    Before          Score Error     After   
>         Score Error     Uplift
> testCompareEQMaskNotByte      ops/s   7912127.225     2677.289518     
> 10266136.26     8955.008548     1.29
> testCompareEQMaskNotDouble    ops/s   884737.6799     446.963779      
> 1179760.772     448.031844      1.33
> testCompareEQMaskNotFloat     ops/s   1765045.787     682.332214      
> 2359520.803     896.305743      1.33
> testCompareEQMaskNotInt               ops/s   1787221.411     977.743935      
> 2353952.519     960.069976      1.31
> testCompareEQMaskNotLong      ops/s   895297.1974     673.44808       
> 1178449.02      323.804205      1.31
> testCompareEQMaskNotShort     ops/s   3339987.002     3415.2226       
> 4712761.965     2110.862053     1.41
> testCompareGEMaskNotByte      ops/s   7907615.16      4094.243652     
> 10251646.9      9486.699831     1.29
> testCompareGEMaskNotInt               ops/s   1683738.958     4233.813092     
> 2352855.205     1251.952546     1.39
> testCompareGEMaskNotLong      ops/s   854496.1561     8594.598885     
> 1177811.493     521.1229        1.37
> testCompareGEMaskNotShort     ops/s   3341860.309     1578.975338     
> 4714008.434     1681.10365      1.41
> testCompareGTMaskNotByte      ops/s   7910823.674     2993.367032     
> 10245063.58     9774.75138      1.29
> testCompareGTMaskNotInt               ops/s   1673393.928     3153.099431     
> 2353654.521     1190.848583     1.4
> testCompareGTMaskNotLong      ops/s   849405.9159     2432.858159     
> 1177952.041     359.96413       1.38
> testCompareGTMaskNotShort     ops/s   3339509.141     3339.976585     
> 4711442.496     2673.364893     1.41
> testCompareLEMaskNotByte      ops/s   7911340.004     3114.69191      
> 10231626.5      27134.20035     1.29
> testCompareLEMaskNotInt               ops/s   1675812.113     1340.969885     
> 2353255.341     1452.4522       1.4
> testCompareLEMaskNotLong      ops/s   848862.8036     6564.841731     
> 1177763.623     539.290106      1.38
> testCompareLEMaskNotShort     ops/s   3324951.54      2380.29473      
> 4712116.251     1544.559684     1.41
> testCompareLTMaskNotByte      ops/s   7910390.844     2630.861436     
> 10239567.69     6487.441672     1.29
> testCompareLTMaskNotInt               ops/s   1672180.09      995.238142      
> 2353757.863     853.774734      1.4
> testCompareLTMaskNotLong      ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains two additional commits since the last 
revision:

 - Merge branch 'master' into JDK-8354242
 - 8354242: VectorAPI: combine vector not operation with compare
   
   This patch optimizes the following patterns:
   For integer types:
   ```
   (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
       => (VectorMaskCmp src1 src2 ncond)
   (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
       => (VectorMaskCmp src1 src2 ncond)
   ```
   cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
   negative comparison of cond.
   
   For float and double types:
   ```
   (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
       => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
       => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   ```
   cond can be eq or ne.
   
   Benchmarks on Nvidia Grace machine with 128-bit SVE2:
   With option `-XX:UseSVE=2`:
   ```
   Benchmark                    Unit    Before          Score Error     After   
        Score Error     Uplift
   testCompareEQMaskNotByte     ops/s   7912127.225     2677.289518     
10266136.26     8955.008548     1.29
   testCompareEQMaskNotDouble   ops/s   884737.6799     446.963779      
1179760.772     448.031844      1.33
   testCompareEQMaskNotFloat    ops/s   1765045.787     682.332214      
2359520.803     896.305743      1.33
   testCompareEQMaskNotInt              ops/s   1787221.411     977.743935      
2353952.519     960.069976      1.31
   testCompareEQMaskNotLong     ops/s   895297.1974     673.44808       
1178449.02      323.804205      1.31
   testCompareEQMaskNotShort    ops/s   3339987.002     3415.2226       
4712761.965     2110.862053     1.41
   testCompareGEMaskNotByte     ops/s   7907615.16      4094.243652     
10251646.9      9486.699831     1.29
   testCompareGEMaskNotInt              ops/s   1683738.958     4233.813092     
2352855.205     1251.952546     1.39
   testCompareGEMaskNotLong     ops/s   854496.1561     8594.598885     
1177811.493     521.1229        1.37
   testCompareGEMaskNotShort    ops/s   3341860.309     1578.975338     
4714008.434     1681.10365      1.41
   testCompareGTMaskNotByte     ops/s   7910823.674     2993.367032     
10245063.58     9774.75138      1.29
   testCompareGTMaskNotInt              ops/s   1673393.928     3153.099431     
2353654.521     1190.848583     1.4
   testCompareGTMaskNotLong     ops/s   849405.9159     2432.858159     
1177952.041     359.96413       1.38
   testCompareGTMaskNotShort    ops/s   3339509.141     3339.976585     
4711442.496     2673.364893     1.41
   testCompareLEMaskNotByte     ops/s   7911340.004     3114.69191      
10231626.5      27134.20035     1.29
   testCompareLEMaskNotInt              ops/s   1675812.113     1340.969885     
2353255.341     1452.4522       1.4
   testCompareLEMaskNotLong     ops/s   848862.8036     6564.841731     
1177763.623     539.290106      1.38
   testCompareLEMaskNotShort    ops/s   3324951.54      2380.29473      
4712116.251     1544.559684     1.41
   testCompareLTMaskNotByte     ops/s   7910390.844     2630.861436     
10239567.69     6487.441672     1.29
   testCompareLTMaskNotInt              ops/s   1672180.09      995.238142      
2353757.863     853.774734      1.4
   testCompareLTMaskNotLong     ops/s   856502.2695     12276.82851     
1177671.815     496.723302      1.37
   testCompareLTMaskNotShort    ops/s   3325798.025     2412.702501     
4711554.181     1779.302112     1.41
   testCompareNEMaskNotByte     ops/s   7910002.518     2771.82477      
10245315.33     16321.93935     1.29
   testCompareNEMaskNotDouble   ops/s   863754.6022     523.140788      
1179133.982     476.572178      1.36
   testCompareNEMaskNotFloat    ops/s   1723321.883     2598.484803     
2358492.186     877.1401        1.36
   testCompareNEMaskNotInt              ops/s   1670288.841     751.774826      
2354158.125     835.720163      1.4
   testCompareNEMaskNotLong     ops/s   836327.6835     410.525466      
1178178.825     308.757932      1.4
   testCompareNEMaskNotShort    ops/s   3327815.841     1511.978763     
4711379.136     2336.505531     1.41
   testCompareUGEMaskNotByte    ops/s   7906699.024     3200.936474     
10253843.74     15067.59401     1.29
   testCompareUGEMaskNotInt     ops/s   1674003.923     3287.191727     
2353340.666     951.381021      1.4
   testCompareUGEMaskNotLong    ops/s   852424.5562     8920.408939     
1177943.609     389.6621        1.38
   testCompareUGEMaskNotShort   ops/s   3327255.858     1584.885143     
4711622.355     1247.215277     1.41
   testCompareUGTMaskNotByte    ops/s   7909249.189     4435.283667     
10245541.34     10993.34739     1.29
   testCompareUGTMaskNotInt     ops/s   1693713.433     20650.00213     
2353153.787     1055.343846     1.38
   testCompareUGTMaskNotLong    ops/s   851022.3395     7079.065268     
1177910.677     538.604598      1.38
   testCompareUGTMaskNotShort   ops/s   3327236.988     1616.886789     
4711209.865     3098.494145     1.41
   testCompareULEMaskNotByte    ops/s   7909350.825     3251.262342     
10261449.03     7273.831341     1.29
   testCompareULEMaskNotInt     ops/s   1672350.925     1545.304304     
2353231.755     914.231193      1.4
   testCompareULEMaskNotLong    ops/s   853349.4765     9804.906913     
1177967.254     435.044367      1.38
   testCompareULEMaskNotShort   ops/s   3325757.891     1555.062257     
4712873.187     1650.986905     1.41
   testCompareULTMaskNotByte    ops/s   7912218.621     2633.477744     
10242095.98     21921.39902     1.29
   testCompareULTMaskNotInt     ops/s   1673994.849     2672.507666     
2353449.22      946.105757      1.4
   testCompareULTMaskNotLong    ops/s   849032.5868     10406.06689     
1177586.047     506.541456      1.38
   testCompareULTMaskNotShort   ops/s   3328062.026     1892.991844     
4713247.216     1855.983724     1.41
   
   ```
   With option `-XX:UseSVE=0`:
   ```
   Benchmark                    Unit    Before          Score Error     After   
        Score Error     Uplift
   testCompareEQMaskNotByte     ops/s   7895961.919     72712.90804     
7746493.731     71481.92938     0.98
   testCompareEQMaskNotDouble   ops/s   789811.0455     384.493088      
766473.7994     2216.581793     0.97
   testCompareEQMaskNotFloat    ops/s   1806305.818     638.010451      
1819616.613     3295.38958      1
   testCompareEQMaskNotInt              ops/s   1815820.144     1225.336135     
1849538.401     766.29902       1.01
   testCompareEQMaskNotLong     ops/s   807336.492      335.451807      
792732.9483     277.954432      0.98
   testCompareEQMaskNotShort    ops/s   4818266.38      1927.862665     
4668903.001     1922.782715     0.96
   testCompareGEMaskNotByte     ops/s   7818439.678     75374.97739     
16498003.98     41440.49653     2.11
   testCompareGEMaskNotInt              ops/s   1815159.05      1090.912209     
2372095.779     1664.397112     1.3
   testCompareGEMaskNotLong     ops/s   804324.5575     2301.686878     
927919.8507     371.766719      1.15
   testCompareGEMaskNotShort    ops/s   4818966.563     2443.643652     
5385561.038     29558.37423     1.11
   testCompareGTMaskNotByte     ops/s   7893406.157     82687.74264     
16470663.2      22165.55812     2.08
   testCompareGTMaskNotInt              ops/s   1815316.812     915.894106      
2370447.198     655.016338      1.3
   testCompareGTMaskNotLong     ops/s   807019.456      526.525482      
928079.0541     330.582693      1.15
   testCompareGTMaskNotShort    ops/s   4820552.881     1684.247747     
5355902.93      5893.2915       1.11
   testCompareLEMaskNotByte     ops/s   7816263.323     79560.0015      
16473621.19     56688.99585     2.1
   testCompareLEMaskNotInt              ops/s   1814915.724     926.998625      
2368790.306     932.594778      1.3
   testCompareLEMaskNotLong     ops/s   806483.9        935.718082      
928110.9074     407.096695      1.15
   testCompareLEMaskNotShort    ops/s   4813660.241     6817.870509     
5357107.852     10061.47975     1.11
   testCompareLTMaskNotByte     ops/s   7838948.962     69136.4504      
16424405.96     24464.75469     2.09
   testCompareLTMaskNotInt              ops/s   1815056.833     1187.6453       
2369892.187     1103.819634     1.3
   testCompareLTMaskNotLong     ops/s   806602.1804     287.923365      
928346.4118     617.682824      1.15
   testCompareLTMaskNotShort    ops/s   4817940.643     2767.1509       
5372537.84      15397.47169     1.11
   testCompareNEMaskNotByte     ops/s   9078493.798     4630.339307     
16484348.42     18925.88346     1.81
   testCompareNEMaskNotDouble   ops/s   661769.6272     398.712981      
926763.5839     1808.843788     1.4
   testCompareNEMaskNotFloat    ops/s   1570527.252     563.642144      
2312425.678     1815.844846     1.47
   testCompareNEMaskNotInt              ops/s   1619146.58      626.793854      
2369711.543     942.330478      1.46
   testCompareNEMaskNotLong     ops/s   680201.5381     2252.836482     
927808.6147     414.917863      1.36
   testCompareNEMaskNotShort    ops/s   3763508.054     3622.560798     
5367808.015     8591.466599     1.42
   testCompareUGEMaskNotByte    ops/s   7886373.129     75917.74675     
16480928.93     27524.31005     2.08
   testCompareUGEMaskNotInt     ops/s   1815636.832     750.036241      
2369683.015     901.609404      1.3
   testCompareUGEMaskNotLong    ops/s   806862.5826     287.819616      
928001.4394     361.063837      1.15
   testCompareUGEMaskNotShort   ops/s   4820581.361     2098.537435     
5375854.248     25619.40165     1.11
   testCompareUGTMaskNotByte    ops/s   7891591.465     96614.93542     
16410405.93     15012.37096     2.07
   testCompareUGTMaskNotInt     ops/s   1814871.179     662.825588      
2371325.903     1170.491164     1.3
   testCompareUGTMaskNotLong    ops/s   804013.7658     2240.534209     
928062.2169     531.306897      1.15
   testCompareUGTMaskNotShort   ops/s   4818150.337     3051.717685     
5381449.337     21212.34187     1.11
   testCompareULEMaskNotByte    ops/s   7831540.628     81306.67253     
16495250.78     38682.19675     2.1
   testCompareULEMaskNotInt     ops/s   1814484.14      687.860656      
2369265.075     940.609586      1.3
   testCompareULEMaskNotLong    ops/s   807780.5749     769.876816      
927538.0732     1278.267724     1.14
   testCompareULEMaskNotShort   ops/s   4817437.42      5141.336541     
5356183.359     7015.608124     1.11
   testCompareULTMaskNotByte    ops/s   7849078.225     56753.59764     
16395975.27     34043.67295     2.08
   testCompareULTMaskNotInt     ops/s   1814328.226     2697.219111     
2370700.47      1991.841988     1.3
   testCompareULTMaskNotLong    ops/s   807166.8197     253.061506      
927926.2803     252.933462      1.14
   testCompareULTMaskNotShort   ops/s   4821098.216     1625.959044     
5348980.243     4100.768121     1.1
   ```
   
   Benchmarks on AMD EPYC 9124 16-Core Processor:
   With option `-XX:UseAVX=3`:
   ```
   Benchmark                    Unit    Before          Score Error     After   
        Score Error     Uplift
   testCompareEQMaskNotByte     ops/s   16607323.35     1233692.631     
18381557.66     1163201.522     1.1
   testCompareEQMaskNotDouble   ops/s   2114285.245     58782.2534      
2959946.353     43016.0445      1.39
   testCompareEQMaskNotFloat    ops/s   4480874.437     89975.29074     
6960151.436     64799.143       1.55
   testCompareEQMaskNotInt              ops/s   4370906.91      51784.80889     
6856955.043     313858.5504     1.56
   testCompareEQMaskNotLong     ops/s   2080065.895     26762.06732     
2939142.143     67179.05314     1.41
   testCompareEQMaskNotShort    ops/s   7968282.563     210437.2781     
12701214.56     473152.6407     1.59
   testCompareGEMaskNotByte     ops/s   18419141.89     473408.9451     
19880059.68     321638.0397     1.07
   testCompareGEMaskNotInt              ops/s   4419015.62      77352.98633     
7037639.227     151066.0383     1.59
   testCompareGEMaskNotLong     ops/s   2147982.48      49227.42782     
3000275.928     39298.75344     1.39
   testCompareGEMaskNotShort    ops/s   8469039.613     17833.19707     
12288229.49     244317.8812     1.45
   testCompareGTMaskNotByte     ops/s   18728997.5      468328.8358     
20544730.05     392264.6466     1.09
   testCompareGTMaskNotInt              ops/s   4510009.705     78812.57357     
7364629.942     70970.78473     1.63
   testCompareGTMaskNotLong     ops/s   2124104.969     40917.89257     
2953536.279     35199.19687     1.39
   testCompareGTMaskNotShort    ops/s   8690557.621     311534.1159     
12344017.51     457931.8741     1.42
   testCompareLEMaskNotByte     ops/s   17758400.53     478383.4945     
19209183.26     1143297.241     1.08
   testCompareLEMaskNotInt              ops/s   4363664.862     43443.18063     
7054093.064     78141.11476     1.61
   testCompareLEMaskNotLong     ops/s   2068632.213     29844.78023     
2954766.412     50667.22502     1.42
   testCompareLEMaskNotShort    ops/s   8637608.548     183538.5511     
12719010.27     473568.8825     1.47
   testCompareLTMaskNotByte     ops/s   14406138.95     423105.0163     
17292417.96     371386.9689     1.2
   testCompareLTMaskNotInt              ops/s   4546707.266     131977.3144     
7040483.394     213590.4657     1.54
   testCompareLTMaskNotLong     ops/s   2123277.356     47243.21499     
2848720.442     58896.97045     1.34
   testCompareLTMaskNotShort    ops/s   7570169.363     649873.6295     
11945383.75     988276.5955     1.57
   testCompareNEMaskNotByte     ops/s   18274529.55     683396.7384     
19081938.8      1118739.778     1.04
   testCompareNEMaskNotDouble   ops/s   2112533.61      43295.50012     
2912115.441     78189.51083     1.37
   testCompareNEMaskNotFloat    ops/s   4628683.814     93817.07362     
6967208.729     145135.8544     1.5
   testCompareNEMaskNotInt              ops/s   4470900.214     75974.50842     
7286913.662     116328.5277     1.62
   testCompareNEMaskNotLong     ops/s   2134091.061     46377.94061     
2934667.477     81675.46021     1.37
   testCompareNEMaskNotShort    ops/s   8790384.287     396161.8599     
13076858.35     286272.1155     1.48
   testCompareUGEMaskNotByte    ops/s   18009150.9      660803.8886     
17551258.33     1667014.843     0.97
   testCompareUGEMaskNotInt     ops/s   4442928.74      83190.81019     
6854088.277     329008.8901     1.54
   testCompareUGEMaskNotLong    ops/s   2088357.736     71696.24791     
2973202.26      63278.78974     1.42
   testCompareUGEMaskNotShort   ops/s   8348624.02      116562.7876     
12832250.78     546869.3006     1.53
   testCompareUGTMaskNotByte    ops/s   17871101.25     800199.6321     
19902619.81     214003.3262     1.11
   testCompareUGTMaskNotInt     ops/s   4088304.421     137797.9723     
7135454.33      124553.651      1.74
   testCompareUGTMaskNotLong    ops/s   2070610.42      19881.82182     
2991536.365     36260.60767     1.44
   testCompareUGTMaskNotShort   ops/s   8637099.341     155822.1608     
12756579.77     186068.199      1.47
   testCompareULEMaskNotByte    ops/s   17940901.36     1258029.364     
18932484.94     694554.6305     1.05
   testCompareULEMaskNotInt     ops/s   4369177.511     74982.31936     
6392773.082     550171.2266     1.46
   testCompareULEMaskNotLong    ops/s   2135905.761     43693.63178     
2877579.631     41651.56289     1.34
   testCompareULEMaskNotShort   ops/s   8607710.544     132655.1676     
12446370.04     441718.3035     1.44
   testCompareULTMaskNotByte    ops/s   17409912.23     1033204.537     
20607479.99     362000.5056     1.18
   testCompareULTMaskNotInt     ops/s   4386455.9       119192.1635     
6920123.264     186158.2845     1.57
   testCompareULTMaskNotLong    ops/s   2064995.149     38622.2734      
2988343.589     39037.90006     1.44
   testCompareULTMaskNotShort   ops/s   8642182.752     230919.2442     
13029582.09     437101.4923     1.5
   ```
   
   The small amount of performance degradation is due to test fluctuations.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/1f3c5899..1b9c3b36

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=00-01

  Stats: 10943 lines in 221 files changed: 9732 ins; 611 del; 600 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674

Reply via email to