This patch optimizes the following patterns:
For integer types:

(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
    => (VectorMaskCmp src1 src2 ncond)
(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
    => (VectorMaskCmp src1 src2 ncond)

cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
negative comparison of cond.

For float and double types:

(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
    => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
    => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))

cond can be eq or ne.

Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
`-XX:UseSVE=2`:

Benchmark                       Unit    Before          Score Error     After   
        Score Error     Uplift
testCompareEQMaskNotByte        ops/s   7912127.225     2677.289518     
10266136.26     8955.008548     1.29
testCompareEQMaskNotDouble      ops/s   884737.6799     446.963779      
1179760.772     448.031844      1.33
testCompareEQMaskNotFloat       ops/s   1765045.787     682.332214      
2359520.803     896.305743      1.33
testCompareEQMaskNotInt         ops/s   1787221.411     977.743935      
2353952.519     960.069976      1.31
testCompareEQMaskNotLong        ops/s   895297.1974     673.44808       
1178449.02      323.804205      1.31
testCompareEQMaskNotShort       ops/s   3339987.002     3415.2226       
4712761.965     2110.862053     1.41
testCompareGEMaskNotByte        ops/s   7907615.16      4094.243652     
10251646.9      9486.699831     1.29
testCompareGEMaskNotInt         ops/s   1683738.958     4233.813092     
2352855.205     1251.952546     1.39
testCompareGEMaskNotLong        ops/s   854496.1561     8594.598885     
1177811.493     521.1229        1.37
testCompareGEMaskNotShort       ops/s   3341860.309     1578.975338     
4714008.434     1681.10365      1.41
testCompareGTMaskNotByte        ops/s   7910823.674     2993.367032     
10245063.58     9774.75138      1.29
testCompareGTMaskNotInt         ops/s   1673393.928     3153.099431     
2353654.521     1190.848583     1.4
testCompareGTMaskNotLong        ops/s   849405.9159     2432.858159     
1177952.041     359.96413       1.38
testCompareGTMaskNotShort       ops/s   3339509.141     3339.976585     
4711442.496     2673.364893     1.41
testCompareLEMaskNotByte        ops/s   7911340.004     3114.69191      
10231626.5      27134.20035     1.29
testCompareLEMaskNotInt         ops/s   1675812.113     1340.969885     
2353255.341     1452.4522       1.4
testCompareLEMaskNotLong        ops/s   848862.8036     6564.841731     
1177763.623     539.290106      1.38
testCompareLEMaskNotShort       ops/s   3324951.54      2380.29473      
4712116.251     1544.559684     1.41
testCompareLTMaskNotByte        ops/s   7910390.844     2630.861436     
10239567.69     6487.441672     1.29
testCompareLTMaskNotInt         ops/s   1672180.09      995.238142      
2353757.863     853.774734      1.4
testCompareLTMaskNotLong        ops/s   856502.2695     12276.82851     
1177671.815     496.723302      1.37
testCompareLTMaskNotShort       ops/s   3325798.025     2412.702501     
4711554.181     1779.302112     1.41
testCompareNEMaskNotByte        ops/s   7910002.518     2771.82477      
10245315.33     16321.93935     1.29
testCompareNEMaskNotDouble      ops/s   863754.6022     523.140788      
1179133.982     476.572178      1.36
testCompareNEMaskNotFloat       ops/s   1723321.883     2598.484803     
2358492.186     877.1401        1.36
testCompareNEMaskNotInt         ops/s   1670288.841     751.774826      
2354158.125     835.720163      1.4
testCompareNEMaskNotLong        ops/s   836327.6835     410.525466      
1178178.825     308.757932      1.4
testCompareNEMaskNotShort       ops/s   3327815.841     1511.978763     
4711379.136     2336.505531     1.41
testCompareUGEMaskNotByte       ops/s   7906699.024     3200.936474     
10253843.74     15067.59401     1.29
testCompareUGEMaskNotInt        ops/s   1674003.923     3287.191727     
2353340.666     951.381021      1.4
testCompareUGEMaskNotLong       ops/s   852424.5562     8920.408939     
1177943.609     389.6621        1.38
testCompareUGEMaskNotShort      ops/s   3327255.858     1584.885143     
4711622.355     1247.215277     1.41
testCompareUGTMaskNotByte       ops/s   7909249.189     4435.283667     
10245541.34     10993.34739     1.29
testCompareUGTMaskNotInt        ops/s   1693713.433     20650.00213     
2353153.787     1055.343846     1.38
testCompareUGTMaskNotLong       ops/s   851022.3395     7079.065268     
1177910.677     538.604598      1.38
testCompareUGTMaskNotShort      ops/s   3327236.988     1616.886789     
4711209.865     3098.494145     1.41
testCompareULEMaskNotByte       ops/s   7909350.825     3251.262342     
10261449.03     7273.831341     1.29
testCompareULEMaskNotInt        ops/s   1672350.925     1545.304304     
2353231.755     914.231193      1.4
testCompareULEMaskNotLong       ops/s   853349.4765     9804.906913     
1177967.254     435.044367      1.38
testCompareULEMaskNotShort      ops/s   3325757.891     1555.062257     
4712873.187     1650.986905     1.41
testCompareULTMaskNotByte       ops/s   7912218.621     2633.477744     
10242095.98     21921.39902     1.29
testCompareULTMaskNotInt        ops/s   1673994.849     2672.507666     
2353449.22      946.105757      1.4
testCompareULTMaskNotLong       ops/s   849032.5868     10406.06689     
1177586.047     506.541456      1.38
testCompareULTMaskNotShort      ops/s   3328062.026     1892.991844     
4713247.216     1855.983724     1.41


With option `-XX:UseSVE=0`:

Benchmark                       Unit    Before          Score Error     After   
        Score Error     Uplift
testCompareEQMaskNotByte        ops/s   7895961.919     72712.90804     
7746493.731     71481.92938     0.98
testCompareEQMaskNotDouble      ops/s   789811.0455     384.493088      
766473.7994     2216.581793     0.97
testCompareEQMaskNotFloat       ops/s   1806305.818     638.010451      
1819616.613     3295.38958      1
testCompareEQMaskNotInt         ops/s   1815820.144     1225.336135     
1849538.401     766.29902       1.01
testCompareEQMaskNotLong        ops/s   807336.492      335.451807      
792732.9483     277.954432      0.98
testCompareEQMaskNotShort       ops/s   4818266.38      1927.862665     
4668903.001     1922.782715     0.96
testCompareGEMaskNotByte        ops/s   7818439.678     75374.97739     
16498003.98     41440.49653     2.11
testCompareGEMaskNotInt         ops/s   1815159.05      1090.912209     
2372095.779     1664.397112     1.3
testCompareGEMaskNotLong        ops/s   804324.5575     2301.686878     
927919.8507     371.766719      1.15
testCompareGEMaskNotShort       ops/s   4818966.563     2443.643652     
5385561.038     29558.37423     1.11
testCompareGTMaskNotByte        ops/s   7893406.157     82687.74264     
16470663.2      22165.55812     2.08
testCompareGTMaskNotInt         ops/s   1815316.812     915.894106      
2370447.198     655.016338      1.3
testCompareGTMaskNotLong        ops/s   807019.456      526.525482      
928079.0541     330.582693      1.15
testCompareGTMaskNotShort       ops/s   4820552.881     1684.247747     
5355902.93      5893.2915       1.11
testCompareLEMaskNotByte        ops/s   7816263.323     79560.0015      
16473621.19     56688.99585     2.1
testCompareLEMaskNotInt         ops/s   1814915.724     926.998625      
2368790.306     932.594778      1.3
testCompareLEMaskNotLong        ops/s   806483.9        935.718082      
928110.9074     407.096695      1.15
testCompareLEMaskNotShort       ops/s   4813660.241     6817.870509     
5357107.852     10061.47975     1.11
testCompareLTMaskNotByte        ops/s   7838948.962     69136.4504      
16424405.96     24464.75469     2.09
testCompareLTMaskNotInt         ops/s   1815056.833     1187.6453       
2369892.187     1103.819634     1.3
testCompareLTMaskNotLong        ops/s   806602.1804     287.923365      
928346.4118     617.682824      1.15
testCompareLTMaskNotShort       ops/s   4817940.643     2767.1509       
5372537.84      15397.47169     1.11
testCompareNEMaskNotByte        ops/s   9078493.798     4630.339307     
16484348.42     18925.88346     1.81
testCompareNEMaskNotDouble      ops/s   661769.6272     398.712981      
926763.5839     1808.843788     1.4
testCompareNEMaskNotFloat       ops/s   1570527.252     563.642144      
2312425.678     1815.844846     1.47
testCompareNEMaskNotInt         ops/s   1619146.58      626.793854      
2369711.543     942.330478      1.46
testCompareNEMaskNotLong        ops/s   680201.5381     2252.836482     
927808.6147     414.917863      1.36
testCompareNEMaskNotShort       ops/s   3763508.054     3622.560798     
5367808.015     8591.466599     1.42
testCompareUGEMaskNotByte       ops/s   7886373.129     75917.74675     
16480928.93     27524.31005     2.08
testCompareUGEMaskNotInt        ops/s   1815636.832     750.036241      
2369683.015     901.609404      1.3
testCompareUGEMaskNotLong       ops/s   806862.5826     287.819616      
928001.4394     361.063837      1.15
testCompareUGEMaskNotShort      ops/s   4820581.361     2098.537435     
5375854.248     25619.40165     1.11
testCompareUGTMaskNotByte       ops/s   7891591.465     96614.93542     
16410405.93     15012.37096     2.07
testCompareUGTMaskNotInt        ops/s   1814871.179     662.825588      
2371325.903     1170.491164     1.3
testCompareUGTMaskNotLong       ops/s   804013.7658     2240.534209     
928062.2169     531.306897      1.15
testCompareUGTMaskNotShort      ops/s   4818150.337     3051.717685     
5381449.337     21212.34187     1.11
testCompareULEMaskNotByte       ops/s   7831540.628     81306.67253     
16495250.78     38682.19675     2.1
testCompareULEMaskNotInt        ops/s   1814484.14      687.860656      
2369265.075     940.609586      1.3
testCompareULEMaskNotLong       ops/s   807780.5749     769.876816      
927538.0732     1278.267724     1.14
testCompareULEMaskNotShort      ops/s   4817437.42      5141.336541     
5356183.359     7015.608124     1.11
testCompareULTMaskNotByte       ops/s   7849078.225     56753.59764     
16395975.27     34043.67295     2.08
testCompareULTMaskNotInt        ops/s   1814328.226     2697.219111     
2370700.47      1991.841988     1.3
testCompareULTMaskNotLong       ops/s   807166.8197     253.061506      
927926.2803     252.933462      1.14
testCompareULTMaskNotShort      ops/s   4821098.216     1625.959044     
5348980.243     4100.768121     1.1


Benchmarks on AMD EPYC 9124 16-Core Processor:
With option `-XX:UseAVX=3`:

Benchmark                       Unit    Before          Score Error     After   
        Score Error     Uplift
testCompareEQMaskNotByte        ops/s   16607323.35     1233692.631     
18381557.66     1163201.522     1.1
testCompareEQMaskNotDouble      ops/s   2114285.245     58782.2534      
2959946.353     43016.0445      1.39
testCompareEQMaskNotFloat       ops/s   4480874.437     89975.29074     
6960151.436     64799.143       1.55
testCompareEQMaskNotInt         ops/s   4370906.91      51784.80889     
6856955.043     313858.5504     1.56
testCompareEQMaskNotLong        ops/s   2080065.895     26762.06732     
2939142.143     67179.05314     1.41
testCompareEQMaskNotShort       ops/s   7968282.563     210437.2781     
12701214.56     473152.6407     1.59
testCompareGEMaskNotByte        ops/s   18419141.89     473408.9451     
19880059.68     321638.0397     1.07
testCompareGEMaskNotInt         ops/s   4419015.62      77352.98633     
7037639.227     151066.0383     1.59
testCompareGEMaskNotLong        ops/s   2147982.48      49227.42782     
3000275.928     39298.75344     1.39
testCompareGEMaskNotShort       ops/s   8469039.613     17833.19707     
12288229.49     244317.8812     1.45
testCompareGTMaskNotByte        ops/s   18728997.5      468328.8358     
20544730.05     392264.6466     1.09
testCompareGTMaskNotInt         ops/s   4510009.705     78812.57357     
7364629.942     70970.78473     1.63
testCompareGTMaskNotLong        ops/s   2124104.969     40917.89257     
2953536.279     35199.19687     1.39
testCompareGTMaskNotShort       ops/s   8690557.621     311534.1159     
12344017.51     457931.8741     1.42
testCompareLEMaskNotByte        ops/s   17758400.53     478383.4945     
19209183.26     1143297.241     1.08
testCompareLEMaskNotInt         ops/s   4363664.862     43443.18063     
7054093.064     78141.11476     1.61
testCompareLEMaskNotLong        ops/s   2068632.213     29844.78023     
2954766.412     50667.22502     1.42
testCompareLEMaskNotShort       ops/s   8637608.548     183538.5511     
12719010.27     473568.8825     1.47
testCompareLTMaskNotByte        ops/s   14406138.95     423105.0163     
17292417.96     371386.9689     1.2
testCompareLTMaskNotInt         ops/s   4546707.266     131977.3144     
7040483.394     213590.4657     1.54
testCompareLTMaskNotLong        ops/s   2123277.356     47243.21499     
2848720.442     58896.97045     1.34
testCompareLTMaskNotShort       ops/s   7570169.363     649873.6295     
11945383.75     988276.5955     1.57
testCompareNEMaskNotByte        ops/s   18274529.55     683396.7384     
19081938.8      1118739.778     1.04
testCompareNEMaskNotDouble      ops/s   2112533.61      43295.50012     
2912115.441     78189.51083     1.37
testCompareNEMaskNotFloat       ops/s   4628683.814     93817.07362     
6967208.729     145135.8544     1.5
testCompareNEMaskNotInt         ops/s   4470900.214     75974.50842     
7286913.662     116328.5277     1.62
testCompareNEMaskNotLong        ops/s   2134091.061     46377.94061     
2934667.477     81675.46021     1.37
testCompareNEMaskNotShort       ops/s   8790384.287     396161.8599     
13076858.35     286272.1155     1.48
testCompareUGEMaskNotByte       ops/s   18009150.9      660803.8886     
17551258.33     1667014.843     0.97
testCompareUGEMaskNotInt        ops/s   4442928.74      83190.81019     
6854088.277     329008.8901     1.54
testCompareUGEMaskNotLong       ops/s   2088357.736     71696.24791     
2973202.26      63278.78974     1.42
testCompareUGEMaskNotShort      ops/s   8348624.02      116562.7876     
12832250.78     546869.3006     1.53
testCompareUGTMaskNotByte       ops/s   17871101.25     800199.6321     
19902619.81     214003.3262     1.11
testCompareUGTMaskNotInt        ops/s   4088304.421     137797.9723     
7135454.33      124553.651      1.74
testCompareUGTMaskNotLong       ops/s   2070610.42      19881.82182     
2991536.365     36260.60767     1.44
testCompareUGTMaskNotShort      ops/s   8637099.341     155822.1608     
12756579.77     186068.199      1.47
testCompareULEMaskNotByte       ops/s   17940901.36     1258029.364     
18932484.94     694554.6305     1.05
testCompareULEMaskNotInt        ops/s   4369177.511     74982.31936     
6392773.082     550171.2266     1.46
testCompareULEMaskNotLong       ops/s   2135905.761     43693.63178     
2877579.631     41651.56289     1.34
testCompareULEMaskNotShort      ops/s   8607710.544     132655.1676     
12446370.04     441718.3035     1.44
testCompareULTMaskNotByte       ops/s   17409912.23     1033204.537     
20607479.99     362000.5056     1.18
testCompareULTMaskNotInt        ops/s   4386455.9       119192.1635     
6920123.264     186158.2845     1.57
testCompareULTMaskNotLong       ops/s   2064995.149     38622.2734      
2988343.589     39037.90006     1.44
testCompareULTMaskNotShort      ops/s   8642182.752     230919.2442     
13029582.09     437101.4923     1.5


The small amount of performance degradation is due to test fluctuations.

-------------

Commit messages:
 - 8354242: VectorAPI: combine vector not operation with compare

Changes: https://git.openjdk.org/jdk/pull/24674/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8354242
  Stats: 1051 lines in 5 files changed: 1048 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674

Reply via email to