On Mon, 15 Sep 2025 09:33:47 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> erifan has updated the pull request incrementally with one additional commit >> since the last revision: >> >> Add an IR rule for vector mask cast operation > > Your benchmark and code changes look good to me. Thanks for addressing my > comments. Thanks @jatin-bhateja . And the updated benchmarks test results are as follow, no much changes. On Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: Benchmark COMPARISON_OP Unit Before Score Error After Score Error Uplift testCompareMaskNotDouble EQ ops/s 908008.7644 827.699314 1175289.515 240.548861 1.294359 testCompareMaskNotDouble NE ops/s 872199.2489 131.090115 1175667.777 129.741515 1.347934 testCompareMaskNotDouble LT ops/s 880166.7559 1570.41653 882160.6889 4723.507639 1.002265 testCompareMaskNotDouble LE ops/s 878115.3293 2919.637497 879033.7895 5404.617017 1.001045 testCompareMaskNotDouble GT ops/s 877068.5325 9595.275981 865832.864 5054.26002 0.987189 testCompareMaskNotDouble GE ops/s 895695.0228 3276.687933 871153.7117 7714.572967 0.9726 testCompareMaskNotFloat EQ ops/s 1811841.295 278.140948 2350971.83 606.667654 1.297559 testCompareMaskNotFloat NE ops/s 1727124.634 1755.717051 2351789.019 269.531198 1.361678 testCompareMaskNotFloat LT ops/s 1735243.319 4912.343726 1726257.01 823.746765 0.994821 testCompareMaskNotFloat LE ops/s 1726151.367 1071.383328 1727029.339 960.336314 1.000508 testCompareMaskNotFloat GT ops/s 1729704.897 1646.026351 1726069.02 440.981281 0.997897 testCompareMaskNotFloat GE ops/s 1726515.227 2171.61643 1728365.682 1404.298156 1.001071 testCompareMaskNotByte EQ ops/s 8480574.694 1254.415788 10200329.86 8560.199493 1.202787 testCompareMaskNotByte NE ops/s 8480141.263 1437.762594 10207424.91 3664.106923 1.203685 testCompareMaskNotByte LT ops/s 8471471.384 7699.585554 10203300.19 4675.047416 1.20443 testCompareMaskNotByte LE ops/s 8476165.519 6045.944392 10204956.23 2174.866199 1.203959 testCompareMaskNotByte GT ops/s 8479397.377 1290.560961 10207032.3 5414.789178 1.203745 testCompareMaskNotByte GE ops/s 8479979.908 1094.823175 10203115.77 2909.433184 1.2032 testCompareMaskNotByte ULT ops/s 8480915.515 1420.30856 10213140.54 19628.56888 1.204249 testCompareMaskNotByte ULE ops/s 8481768.961 1806.086454 10191601.05 9537.089409 1.201589 testCompareMaskNotByte UGT ops/s 8477948.807 3652.437106 10208439.79 8335.226416 1.204116 testCompareMaskNotByte UGE ops/s 8477320.065 2191.753237 10198589.9 5748.761942 1.203044 testCompareMaskNotInt EQ ops/s 1906386.393 208.045573 2346741.129 383.461819 1.230989 testCompareMaskNotInt NE ops/s 1674206.146 169.967081 2346609.602 652.964692 1.401625 testCompareMaskNotInt LT ops/s 1684755.085 4939.806653 2345939.728 738.842445 1.392451 testCompareMaskNotInt LE ops/s 1659985.83 2408.542766 2346929.8 192.550397 1.413825 testCompareMaskNotInt GT ops/s 1674460.437 447.120589 2347037.155 342.433085 1.401667 testCompareMaskNotInt GE ops/s 1658699.073 884.268891 2347411.827 281.885914 1.415212 testCompareMaskNotInt ULT ops/s 1677043.66 6215.834359 2347155.384 425.141786 1.399579 testCompareMaskNotInt ULE ops/s 1667049.76 9521.094204 2346815.213 316.03901 1.407765 testCompareMaskNotInt UGT ops/s 1661045.828 3669.548525 2346711.365 2808.608132 1.412791 testCompareMaskNotInt UGE ops/s 1663715.691 4570.73053 2347096.847 191.804359 1.410755 testCompareMaskNotLong EQ ops/s 885668.5947 203.053456 1174274.006 113.51354 1.325861 testCompareMaskNotLong NE ops/s 837449.9353 198.611966 1174330.269 106.514374 1.402269 testCompareMaskNotLong LT ops/s 846790.2128 7005.585657 1174290.879 93.56413 1.386755 testCompareMaskNotLong LE ops/s 851253.2346 7624.045467 1174162.355 179.854316 1.379333 testCompareMaskNotLong GT ops/s 837715.7563 4272.558281 1173797.819 289.311518 1.401188 testCompareMaskNotLong GE ops/s 883137.593 14804.63746 1174216.909 86.404559 1.329596 testCompareMaskNotLong ULT ops/s 872478.9017 4955.722542 1174341.995 124.656933 1.345983 testCompareMaskNotLong ULE ops/s 866570.738 12541.58528 1174185.197 594.850706 1.354979 testCompareMaskNotLong UGT ops/s 866389.0927 3971.492766 1174210.803 153.960084 1.355292 testCompareMaskNotLong UGE ops/s 848339.3876 4555.514721 1174060.638 240.326562 1.383951 testCompareMaskNotShort EQ ops/s 3336170.783 2286.717236 4684904.156 2134.72575 1.404275 testCompareMaskNotShort NE ops/s 3334775.472 717.588615 4690264.12 3017.756867 1.40647 testCompareMaskNotShort LT ops/s 3334619.058 1138.901707 4685883.864 3808.321694 1.405223 testCompareMaskNotShort LE ops/s 3335538.353 538.676789 4688238.934 1029.406266 1.405541 testCompareMaskNotShort GT ops/s 3301425.217 694.060525 4689167.049 2845.363801 1.420346 testCompareMaskNotShort GE ops/s 3301580.972 317.042851 4688970.211 1292.83929 1.420219 testCompareMaskNotShort ULT ops/s 3336318.051 892.515034 4687549.384 1403.281648 1.405006 testCompareMaskNotShort ULE ops/s 3335188.292 972.230191 4684723.63 3937.599084 1.404635 testCompareMaskNotShort UGT ops/s 3334490.656 930.409628 4688058.378 1166.776081 1.405929 testCompareMaskNotShort UGE ops/s 3333050.033 3146.019596 4689197.9 456.439188 1.406878 With option `-XX:UseSVE=0`: Benchmark COMPARISON_OP Unit Before Score Error After Score Error Uplift testCompareMaskNotDouble EQ ops/s 788505.9464 579.254839 769969.5798 138.792325 0.976491 testCompareMaskNotDouble NE ops/s 655499.7935 471.970429 915086.3257 183.495964 1.396013 testCompareMaskNotDouble LT ops/s 788418.7889 574.263314 789271.7448 51.838991 1.001081 testCompareMaskNotDouble LE ops/s 789144.8431 45.334181 789326.1963 84.148011 1.000229 testCompareMaskNotDouble GT ops/s 788690.8485 662.950083 789246.9812 99.060588 1.000705 testCompareMaskNotDouble GE ops/s 789421.2387 94.012868 789166.4717 111.772533 0.999677 testCompareMaskNotFloat EQ ops/s 1816132.864 1298.2187 1816461.601 311.706275 1.000181 testCompareMaskNotFloat NE ops/s 1550767.697 1142.987761 2301429.148 159.71525 1.484057 testCompareMaskNotFloat LT ops/s 1815531.685 1370.868745 1817187.121 761.68401 1.000911 testCompareMaskNotFloat LE ops/s 1817937.722 484.638134 1817703.209 625.275639 0.999871 testCompareMaskNotFloat GT ops/s 1818618.89 724.324392 1817977.851 481.152488 0.999647 testCompareMaskNotFloat GE ops/s 1815118.411 1327.945736 1817476.414 510.712942 1.001299 testCompareMaskNotByte EQ ops/s 6489599.571 5127.815254 6535895.286 17029.15534 1.007133 testCompareMaskNotByte NE ops/s 9089974.523 4069.346579 15945662.17 22867.48282 1.754203 testCompareMaskNotByte LT ops/s 6499040.898 1250.085336 15939338.57 17451.05939 2.452567 testCompareMaskNotByte LE ops/s 6493612.339 4928.466061 15926355.01 27249.57103 2.452618 testCompareMaskNotByte GT ops/s 6494486.565 5229.4598 15957497.14 6893.237334 2.457083 testCompareMaskNotByte GE ops/s 6499295.661 1030.044749 15903755.01 46454.70992 2.446996 testCompareMaskNotByte ULT ops/s 6494212.684 5194.712704 15944816.71 3467.818892 2.455234 testCompareMaskNotByte ULE ops/s 6493882.576 5092.839387 15936419.25 22755.34523 2.454066 testCompareMaskNotByte UGT ops/s 6493479.899 4678.096391 15958133.18 3483.353667 2.457562 testCompareMaskNotByte UGE ops/s 6500338.419 709.344957 15968155.27 14020.47085 2.456511 testCompareMaskNotInt EQ ops/s 1830787.273 237.597163 1878452.588 142.728192 1.026035 testCompareMaskNotInt NE ops/s 1615081.395 1219.871461 2360913.712 199.556675 1.461792 testCompareMaskNotInt LT ops/s 1827819.867 1360.728526 2360561.422 248.025925 1.291462 testCompareMaskNotInt LE ops/s 1830975.648 416.987529 2360703.924 194.958346 1.289314 testCompareMaskNotInt GT ops/s 1830633.964 301.849017 2360552.203 224.908655 1.289472 testCompareMaskNotInt GE ops/s 1829476.495 1348.361278 2360673.736 137.538696 1.290354 testCompareMaskNotInt ULT ops/s 1829137.773 1285.55232 2360615.95 162.876291 1.290562 testCompareMaskNotInt ULE ops/s 1828107.468 1360.867847 2360790.337 297.267481 1.291384 testCompareMaskNotInt UGT ops/s 1829659.222 1459.098806 2361025.107 266.158075 1.290417 testCompareMaskNotInt UGE ops/s 1829548.187 1427.266787 2360941.943 242.380469 1.29045 testCompareMaskNotLong EQ ops/s 810439.9121 82.577412 802287.4993 73.462086 0.98994 testCompareMaskNotLong NE ops/s 681643.6089 485.657471 932324.6973 158.28799 1.367759 testCompareMaskNotLong LT ops/s 809850.546 680.71673 931404.3219 685.591444 1.150094 testCompareMaskNotLong LE ops/s 810584.5191 115.234753 932234.2412 105.451172 1.150076 testCompareMaskNotLong GT ops/s 810593.5376 117.947863 931879.1829 553.397713 1.149625 testCompareMaskNotLong GE ops/s 810435.8405 81.88737 931833.0348 177.765694 1.149792 testCompareMaskNotLong ULT ops/s 810429.8459 90.005329 932127.5278 74.443387 1.150164 testCompareMaskNotLong ULE ops/s 809740.842 411.655134 932231.6607 76.044104 1.151271 testCompareMaskNotLong UGT ops/s 810493.4369 52.024062 932239.1709 143.915229 1.150211 testCompareMaskNotLong UGE ops/s 810442.0661 64.064396 932361.567 119.570287 1.150435 testCompareMaskNotShort EQ ops/s 4786426.182 299.050738 4694123.013 482.608634 0.980715 testCompareMaskNotShort NE ops/s 3808932.807 2993.590606 5672255.469 6262.526335 1.489198 testCompareMaskNotShort LT ops/s 4782535.485 3699.104322 5668474.071 11101.86452 1.185244 testCompareMaskNotShort LE ops/s 4782896.891 3338.57484 5669188.434 6309.723399 1.185304 testCompareMaskNotShort GT ops/s 4778532.318 3571.547653 5680482.703 10427.66734 1.18875 testCompareMaskNotShort GE ops/s 4786150.851 794.769881 5664644.919 6542.434538 1.183549 testCompareMaskNotShort ULT ops/s 4783623.78 3582.962421 5668267.123 17841.44773 1.184931 testCompareMaskNotShort ULE ops/s 4782752.125 3610.296618 5666231.302 6964.505363 1.184721 testCompareMaskNotShort UGT ops/s 4782469.332 2913.37576 5655837.96 6494.608864 1.182618 testCompareMaskNotShort UGE ops/s 4782606.35 3491.774067 5667295.182 14176.96543 1.18498 On AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: Benchmark COMPARISON_OP Unit Before Score Error After Score Error Uplift testCompareMaskNotDouble EQ ops/s 2166357.886 27577.51358 2920183.192 38491.49083 1.347968 testCompareMaskNotDouble NE ops/s 2177325.341 32771.27023 2965747.932 39271.62615 1.362106 testCompareMaskNotDouble LT ops/s 2123834.711 22890.39919 2197099.169 29107.41329 1.034496 testCompareMaskNotDouble LE ops/s 2172931.681 32912.05647 2121686.057 34927.37781 0.976416 testCompareMaskNotDouble GT ops/s 2164924.662 30925.91899 2124062.892 37135.0458 0.981125 testCompareMaskNotDouble GE ops/s 2150619.038 35515.09022 2192636.533 38672.85716 1.019537 testCompareMaskNotFloat EQ ops/s 4518378.764 74733.72389 6724589.409 50424.63568 1.488274 testCompareMaskNotFloat NE ops/s 4522823.224 78138.66727 6907565.257 203953.3299 1.527268 testCompareMaskNotFloat LT ops/s 4587473.545 62621.25938 4431658.918 52760.23989 0.966034 testCompareMaskNotFloat LE ops/s 4472078.986 79338.23304 4472390.043 66247.285 1.000069 testCompareMaskNotFloat GT ops/s 4451744.39 220787.9755 4440866.486 58674.19154 0.997556 testCompareMaskNotFloat GE ops/s 4459601.349 57873.05167 4481398.426 76819.69285 1.004887 testCompareMaskNotByte EQ ops/s 19415317.92 356367.4937 20649319.86 240515.9459 1.063558 testCompareMaskNotByte NE ops/s 19401162.58 362571.8103 21010358.2 71221.35255 1.082943 testCompareMaskNotByte LT ops/s 19175612.37 273080.6175 20235838.72 396190.6101 1.05529 testCompareMaskNotByte LE ops/s 19036831.33 121135.0491 20674528.84 248839.9471 1.086027 testCompareMaskNotByte GT ops/s 19008302.3 124633.9182 20671390.89 271644.5576 1.087492 testCompareMaskNotByte GE ops/s 19590753.42 429156.452 20491615.07 332912.82 1.045984 testCompareMaskNotByte ULT ops/s 19431604.06 421396.5487 20575805.9 248466.2368 1.058883 testCompareMaskNotByte ULE ops/s 19060425.47 98309.75469 20774930.43 206596.0422 1.089951 testCompareMaskNotByte UGT ops/s 19266788.04 362893.3051 20861521.87 106977.3707 1.082771 testCompareMaskNotByte UGE ops/s 19127964.33 447774.3747 20791221.56 254458.0132 1.086954 testCompareMaskNotInt EQ ops/s 4473402.48 84902.77154 7191777.028 94315.13878 1.607674 testCompareMaskNotInt NE ops/s 4583165.363 73491.79073 7249884.988 80028.31191 1.581851 testCompareMaskNotInt LT ops/s 4618634.192 81869.82512 7242567.732 71211.3697 1.568118 testCompareMaskNotInt LE ops/s 4650524.195 72302.56692 7154948.491 83057.90635 1.538525 testCompareMaskNotInt GT ops/s 4534752.486 94449.20198 7004428.251 38365.18576 1.54461 testCompareMaskNotInt GE ops/s 4540777.389 86331.11847 7129527.341 74343.06996 1.570111 testCompareMaskNotInt ULT ops/s 4528175.644 114213.6504 7220013.98 82850.22587 1.594464 testCompareMaskNotInt ULE ops/s 4619335.448 74203.98889 7118543.128 54457.43284 1.541031 testCompareMaskNotInt UGT ops/s 4572521.254 122912.75 7154797.741 98858.3477 1.564737 testCompareMaskNotInt UGE ops/s 4579627.842 80558.04554 7179020.593 99239.23499 1.567599 testCompareMaskNotLong EQ ops/s 2103965.347 17059.28178 2997338.009 32388.42725 1.424613 testCompareMaskNotLong NE ops/s 2174434.633 36011.24708 2984460.593 29074.42994 1.372522 testCompareMaskNotLong LT ops/s 2110937.378 56642.0052 3020690.893 31167.62537 1.430971 testCompareMaskNotLong LE ops/s 2153414.166 31280.20562 2971696.162 31176.24605 1.379992 testCompareMaskNotLong GT ops/s 2166028.207 49432.18925 3008018.282 26534.78551 1.388725 testCompareMaskNotLong GE ops/s 2178206.136 35757.6799 2933186.687 19824.26727 1.346606 testCompareMaskNotLong ULT ops/s 2104344.728 31405.7728 2964354.007 26871.18289 1.408682 testCompareMaskNotLong ULE ops/s 2210232.578 21993.95777 3032635.261 25545.43656 1.372088 testCompareMaskNotLong UGT ops/s 2167177.931 44896.90807 2996245.236 34153.68941 1.382556 testCompareMaskNotLong UGE ops/s 2117175.328 26131.1893 2977492.164 23227.65519 1.406351 testCompareMaskNotShort EQ ops/s 8131234.179 185997.1777 12414378.38 122648.1579 1.526752 testCompareMaskNotShort NE ops/s 8506016.656 236481.383 12720442.64 322747.8776 1.495464 testCompareMaskNotShort LT ops/s 8487868.819 244943.6097 12150479.62 244300.5456 1.431511 testCompareMaskNotShort LE ops/s 8549184.557 286833.466 12358019.06 136683.2112 1.44552 testCompareMaskNotShort GT ops/s 8375447.45 221237.073 12602058.97 385690.3318 1.504643 testCompareMaskNotShort GE ops/s 8123474.548 127727.1461 12799747.64 197940.1001 1.575649 testCompareMaskNotShort ULT ops/s 8491650.422 313124.2425 12751186.59 255845.1653 1.501614 testCompareMaskNotShort ULE ops/s 8363009.676 203670.1995 12675908.7 279496.9925 1.515711 testCompareMaskNotShort UGT ops/s 8332268.933 279787.2503 12279451.4 436971.6582 1.473722 testCompareMaskNotShort UGE ops/s 8931588.505 203962.9257 12324437.67 330723.3066 1.37987 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3291304777