On Wed, 25 Oct 2023 04:34:59 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
> Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and > AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather with high performance backend implementation > based on hybrid algorithm which initially partially unrolls scalar loop to > accumulates values from gather indices into a quadword(64bit) slice followed > by vector permutation to place the slice into appropriate vector lanes, it > prevents code bloating and generates compact > JIT sequence. This coupled with savings from expansive array allocation in > existing java implementation translates into significant performance of > 1.3-5x gains with included micro. > > >  > > > 2) Patch was also compared against modified java fallback implementation by > replacing temporary array allocation with zero initialized vector and a > scalar loops which inserts gathered values into vector. But, vector insert > operation in higher vector lanes is a three step process which first extracts > the upper vector 128 bit lane, updates it with gather subword value and then > inserts the lane back to its original position. This makes inserts into > higher order lanes costly w.r.t to proposed solution. In addition generated > JIT code for modified fallback implementation was very bulky. This may impact > in-lining decisions into caller contexts. > > 3) Some minor adjustments in existing gather instruction pattens for > double/quad words. > > > Kindly review and share your feedback. > > > Best Regards, > Jatin **Detailed performance numberers with AVX2** Benchmark | Size | Baseline Score (ops/ms) | WithOpt Score (ops/ms) | Gain Factor (opt/baseline) -- | -- | -- | -- | -- GatherOperationsBenchmark.microByteGather128 | 64 | 15916.774 | 34288.944 | 2.154264677 GatherOperationsBenchmark.microByteGather128 | 256 | 4128.501 | 8793.293 | 2.12989969 GatherOperationsBenchmark.microByteGather128 | 1024 | 1027.606 | 2217.138 | 2.157575958 GatherOperationsBenchmark.microByteGather128 | 4096 | 264.002 | 554.603 | 2.100753025 GatherOperationsBenchmark.microByteGather128_MASK | 64 | 16729.183 | 26308.667 | 1.57262115 GatherOperationsBenchmark.microByteGather128_MASK | 256 | 4157.73 | 7312.934 | 1.758876599 GatherOperationsBenchmark.microByteGather128_MASK | 1024 | 1067.675 | 1828.035 | 1.712164282 GatherOperationsBenchmark.microByteGather128_MASK | 4096 | 268.538 | 462.191 | 1.721138163 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 64 | 16559.725 | 25355.415 | 1.531149521 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 256 | 4190.36 | 6596.82 | 1.574284787 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 1024 | 1070.641 | 1638.323 | 1.530226285 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 4096 | 274.703 | 415.345 | 1.511978391 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 64 | 15445.814 | 30518.41 | 1.975836948 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 256 | 4087.154 | 8075.382 | 1.975795872 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 1024 | 1035.527 | 2008.003 | 1.939112162 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 4096 | 262.936 | 501.675 | 1.907973804 GatherOperationsBenchmark.microByteGather256 | 64 | 18266.25 | 37549.708 | 2.05568784 GatherOperationsBenchmark.microByteGather256 | 256 | 4714.027 | 9894.099 | 2.098863456 GatherOperationsBenchmark.microByteGather256 | 1024 | 1147.282 | 2490.351 | 2.1706529 GatherOperationsBenchmark.microByteGather256 | 4096 | 286.935 | 622.153 | 2.16827156 GatherOperationsBenchmark.microByteGather256_MASK | 64 | 21992.019 | 27357.032 | 1.243952727 GatherOperationsBenchmark.microByteGather256_MASK | 256 | 5732.258 | 7760.398 | 1.353811709 GatherOperationsBenchmark.microByteGather256_MASK | 1024 | 1495.632 | 1964.343 | 1.313386582 GatherOperationsBenchmark.microByteGather256_MASK | 4096 | 386.313 | 480.509 | 1.243833368 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 64 | 19911.793 | 26818.552 | 1.346867758 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 256 | 5013.248 | 7040.98 | 1.404474704 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 1024 | 1289.123 | 1785.368 | 1.384947751 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 4096 | 332.791 | 452.568 | 1.359916584 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 64 | 17147.769 | 33913.351 | 1.977712144 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 256 | 4386.044 | 8640.734 | 1.970051828 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 1024 | 1097.485 | 2261.998 | 2.061074183 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 4096 | 277.155 | 565.051 | 2.038754488 GatherOperationsBenchmark.microByteGather64 | 64 | 13068.085 | 37960.616 | 2.904833876 GatherOperationsBenchmark.microByteGather64 | 256 | 3227.857 | 9935.642 | 3.078092369 GatherOperationsBenchmark.microByteGather64 | 1024 | 834.99 | 2530.696 | 3.03080995 GatherOperationsBenchmark.microByteGather64 | 4096 | 212.664 | 637.938 | 2.999746078 GatherOperationsBenchmark.microByteGather64_MASK | 64 | 13548.225 | 30755.634 | 2.27008586 GatherOperationsBenchmark.microByteGather64_MASK | 256 | 3347.844 | 8026.22 | 2.39742951 GatherOperationsBenchmark.microByteGather64_MASK | 1024 | 843.279 | 2072.913 | 2.458157976 GatherOperationsBenchmark.microByteGather64_MASK | 4096 | 213.316 | 544.853 | 2.554205967 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 64 | 12982.383 | 28193.925 | 2.171706458 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 256 | 3288.497 | 7483.684 | 2.275715623 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 1024 | 834.342 | 1860.542 | 2.229951267 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 4096 | 208.107 | 473.987 | 2.277611998 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 64 | 13079.567 | 32992.977 | 2.522482357 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 256 | 3321.098 | 8987.837 | 2.706284789 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 1024 | 865.324 | 2362.563 | 2.73026404 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 4096 | 216.768 | 575.35 | 2.65422018 GatherOperationsBenchmark.microShortGather128 | 64 | 12835.472 | 31370.111 | 2.44401694 GatherOperationsBenchmark.microShortGather128 | 256 | 3151.091 | 8603.442 | 2.730305789 GatherOperationsBenchmark.microShortGather128 | 1024 | 820.026 | 2158.645 | 2.632410436 GatherOperationsBenchmark.microShortGather128 | 4096 | 205.263 | 535.444 | 2.60857534 GatherOperationsBenchmark.microShortGather128_MASK | 64 | 13055.905 | 23957.317 | 1.834979421 GatherOperationsBenchmark.microShortGather128_MASK | 256 | 3234.501 | 6416.879 | 1.983885304 GatherOperationsBenchmark.microShortGather128_MASK | 1024 | 829.648 | 1578.415 | 1.902511668 GatherOperationsBenchmark.microShortGather128_MASK | 4096 | 206.04 | 416.303 | 2.02049602 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 64 | 12905.373 | 22475.815 | 1.74158585 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 256 | 3202.372 | 5695.988 | 1.778677805 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 1024 | 814.645 | 1412.466 | 1.733842349 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 4096 | 199.535 | 355.407 | 1.781176235 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 64 | 12329.793 | 27620.341 | 2.240130147 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 256 | 3146.016 | 7664.47 | 2.436246351 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 1024 | 794.335 | 1925.535 | 2.424084297 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 4096 | 195.754 | 485.942 | 2.482411598 GatherOperationsBenchmark.microShortGather256 | 64 | 15430.153 | 33050.636 | 2.141951282 GatherOperationsBenchmark.microShortGather256 | 256 | 4042.835 | 8901.664 | 2.201837077 GatherOperationsBenchmark.microShortGather256 | 1024 | 986.361 | 2180.195 | 2.210341853 GatherOperationsBenchmark.microShortGather256 | 4096 | 250.057 | 560.523 | 2.24158092 GatherOperationsBenchmark.microShortGather256_MASK | 64 | 16793.012 | 23516.915 | 1.400398868 GatherOperationsBenchmark.microShortGather256_MASK | 256 | 4249.641 | 6505.857 | 1.5309192 GatherOperationsBenchmark.microShortGather256_MASK | 1024 | 1105.868 | 1600.44 | 1.447225166 GatherOperationsBenchmark.microShortGather256_MASK | 4096 | 268.443 | 410.052 | 1.527519809 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 64 | 16107.265 | 22559.877 | 1.400602585 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 256 | 4035.417 | 5872.376 | 1.455209214 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 1024 | 1028.671 | 1469.825 | 1.428858206 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 4096 | 258.639 | 370.997 | 1.434420176 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 64 | 14761.16 | 27601.245 | 1.869856095 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 256 | 3905.104 | 7684.751 | 1.967873583 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 1024 | 986.575 | 1853.319 | 1.878538378 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 4096 | 248.541 | 485.734 | 1.954341537 GatherOperationsBenchmark.microShortGather64 | 64 | 7942.618 | 33097.908 | 4.167128269 GatherOperationsBenchmark.microShortGather64 | 256 | 2009.148 | 9039.775 | 4.499307667 GatherOperationsBenchmark.microShortGather64 | 1024 | 506.769 | 2198.022 | 4.33732529 GatherOperationsBenchmark.microShortGather64 | 4096 | 118.499 | 565.551 | 4.772622554 GatherOperationsBenchmark.microShortGather64_MASK | 64 | 7802.345 | 23559.186 | 3.019500676 GatherOperationsBenchmark.microShortGather64_MASK | 256 | 1917.049 | 6278.454 | 3.275061827 GatherOperationsBenchmark.microShortGather64_MASK | 1024 | 491.248 | 1569.524 | 3.194972804 GatherOperationsBenchmark.microShortGather64_MASK | 4096 | 117.255 | 398.438 | 3.398046992 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 64 | 7697.165 | 22599.8 | 2.936119987 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 256 | 1913.269 | 5986.04 | 3.128697533 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 1024 | 483.724 | 1491.969 | 3.084339417 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 4096 | 116.716 | 375.492 | 3.217142465 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 64 | 7882.26 | 29755.573 | 3.775005265 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 256 | 1992.655 | 7969.383 | 3.99937922 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 1024 | 498.249 | 1997.082 | 4.008200719 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 4096 | 117.764 | 497.177 | 4.221808023 </body> </html> **Detailed performance numbers with AVX3** Benchmark | Size | Baseline Score (ops/ms) | WithOpt Score (ops/ms) | Gain Factor (opt/baseline) -- | -- | -- | -- | -- GatherOperationsBenchmark.microByteGather128 | 64 | 15900.681 | 35745.941 | 2.248076104 GatherOperationsBenchmark.microByteGather128 | 256 | 4194.349 | 9931.187 | 2.36775409 GatherOperationsBenchmark.microByteGather128 | 1024 | 1064.611 | 2528.468 | 2.375015851 GatherOperationsBenchmark.microByteGather128 | 4096 | 270.486 | 633.351 | 2.341529691 GatherOperationsBenchmark.microByteGather128_MASK | 64 | 17836.944 | 30418.654 | 1.705373634 GatherOperationsBenchmark.microByteGather128_MASK | 256 | 4411.449 | 8451.317 | 1.915768946 GatherOperationsBenchmark.microByteGather128_MASK | 1024 | 1155.587 | 2119.895 | 1.8344746 GatherOperationsBenchmark.microByteGather128_MASK | 4096 | 287.88 | 538.807 | 1.871637488 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 64 | 16678.512 | 27223.074 | 1.632224385 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 256 | 4268.674 | 7395.33 | 1.732465398 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 1024 | 1119.764 | 1854.529 | 1.656178445 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF | 4096 | 276.836 | 469.102 | 1.694512274 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 64 | 15561.662 | 33674.023 | 2.163909163 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 256 | 4065.922 | 9427.52 | 2.318667205 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 1024 | 1030.027 | 2430.395 | 2.359544944 GatherOperationsBenchmark.microByteGather128_NZ_OFF | 4096 | 261.51 | 609.811 | 2.331884058 GatherOperationsBenchmark.microByteGather256 | 64 | 17993.999 | 36026.071 | 2.002115872 GatherOperationsBenchmark.microByteGather256 | 256 | 4646.105 | 9695.417 | 2.086783876 GatherOperationsBenchmark.microByteGather256 | 1024 | 1131.979 | 2487.113 | 2.197137049 GatherOperationsBenchmark.microByteGather256 | 4096 | 278.159 | 624.745 | 2.24599959 GatherOperationsBenchmark.microByteGather256_MASK | 64 | 22898.291 | 30126.448 | 1.315663601 GatherOperationsBenchmark.microByteGather256_MASK | 256 | 5473.285 | 8843.556 | 1.615767496 GatherOperationsBenchmark.microByteGather256_MASK | 1024 | 1415.369 | 2230.048 | 1.575594774 GatherOperationsBenchmark.microByteGather256_MASK | 4096 | 358.725 | 556.882 | 1.552392501 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 64 | 20186.469 | 27915.464 | 1.382879988 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 256 | 5214.919 | 7578.939 | 1.453318642 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 1024 | 1360.825 | 1902.398 | 1.397974023 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF | 4096 | 359.569 | 487.59 | 1.356040148 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 64 | 17154.31 | 35904.295 | 2.093018897 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 256 | 4404.264 | 8997.564 | 2.042921133 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 1024 | 1098.961 | 2317.713 | 2.109003868 GatherOperationsBenchmark.microByteGather256_NZ_OFF | 4096 | 275.722 | 576.866 | 2.092201565 GatherOperationsBenchmark.microByteGather512 | 64 | 18790.829 | 38455.649 | 2.046511572 GatherOperationsBenchmark.microByteGather512 | 256 | 4806.001 | 10023.706 | 2.085664568 GatherOperationsBenchmark.microByteGather512 | 1024 | 1164.771 | 2558.357 | 2.19644634 GatherOperationsBenchmark.microByteGather512 | 4096 | 286.714 | 640.06 | 2.232398836 GatherOperationsBenchmark.microByteGather512_MASK | 64 | 25265.683 | 32738.543 | 1.295771145 GatherOperationsBenchmark.microByteGather512_MASK | 256 | 6417.048 | 8900.835 | 1.387060686 GatherOperationsBenchmark.microByteGather512_MASK | 1024 | 1726.425 | 2231.39 | 1.29249171 GatherOperationsBenchmark.microByteGather512_MASK | 4096 | 438.445 | 562.29 | 1.282464163 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF | 64 | 22097.788 | 29326.431 | 1.32712066 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF | 256 | 5587.934 | 7937.573 | 1.420484387 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF | 1024 | 1485.739 | 1967.966 | 1.324570466 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF | 4096 | 350.395 | 502.295 | 1.433510752 GatherOperationsBenchmark.microByteGather512_NZ_OFF | 64 | 17904.091 | 34883.849 | 1.948373084 GatherOperationsBenchmark.microByteGather512_NZ_OFF | 256 | 4532.971 | 9354.373 | 2.063629571 GatherOperationsBenchmark.microByteGather512_NZ_OFF | 1024 | 1135.769 | 2394.267 | 2.108058065 GatherOperationsBenchmark.microByteGather512_NZ_OFF | 4096 | 285.823 | 588.764 | 2.059890212 GatherOperationsBenchmark.microByteGather64 | 64 | 13044.341 | 32947.355 | 2.525796819 GatherOperationsBenchmark.microByteGather64 | 256 | 3244.318 | 8817.036 | 2.717685504 GatherOperationsBenchmark.microByteGather64 | 1024 | 812.016 | 2205.047 | 2.715521615 GatherOperationsBenchmark.microByteGather64 | 4096 | 212.882 | 559.439 | 2.627930027 GatherOperationsBenchmark.microByteGather64_MASK | 64 | 13328.592 | 25055.284 | 1.879814762 GatherOperationsBenchmark.microByteGather64_MASK | 256 | 3294.445 | 6779.14 | 2.057748726 GatherOperationsBenchmark.microByteGather64_MASK | 1024 | 832.091 | 1693.255 | 2.034939688 GatherOperationsBenchmark.microByteGather64_MASK | 4096 | 213.049 | 432.162 | 2.028462936 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 64 | 12713.428 | 23246.19 | 1.828475373 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 256 | 3238.82 | 6168.581 | 1.904576667 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 1024 | 819.699 | 1548.811 | 1.889487483 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF | 4096 | 207.266 | 388.488 | 1.874345045 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 64 | 12740.659 | 28229.421 | 2.215695515 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 256 | 3255.884 | 7816.444 | 2.400713293 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 1024 | 831.691 | 1976.915 | 2.376982557 GatherOperationsBenchmark.microByteGather64_NZ_OFF | 4096 | 210.812 | 503.111 | 2.386538717 GatherOperationsBenchmark.microShortGather128 | 64 | 12858.925 | 34710.696 | 2.699346641 GatherOperationsBenchmark.microShortGather128 | 256 | 3106.472 | 9171.459 | 2.952371372 GatherOperationsBenchmark.microShortGather128 | 1024 | 819.192 | 2278.838 | 2.781811834 GatherOperationsBenchmark.microShortGather128 | 4096 | 204.157 | 575.636 | 2.819575131 GatherOperationsBenchmark.microShortGather128_MASK | 64 | 12528.506 | 28202.741 | 2.251085724 GatherOperationsBenchmark.microShortGather128_MASK | 256 | 3236.653 | 7798 | 2.409278968 GatherOperationsBenchmark.microShortGather128_MASK | 1024 | 820.409 | 1991.597 | 2.427566007 GatherOperationsBenchmark.microShortGather128_MASK | 4096 | 203.635 | 509.145 | 2.500282368 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 64 | 12166.914 | 25418.26 | 2.089129585 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 256 | 3202.89 | 6914.467 | 2.158821252 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 1024 | 799.485 | 1752.541 | 2.192087406 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF | 4096 | 199.868 | 442.822 | 2.215572278 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 64 | 12531.197 | 31505.217 | 2.514142663 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 256 | 3150.884 | 9098.353 | 2.887555683 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 1024 | 806.932 | 2280.42 | 2.826037386 GatherOperationsBenchmark.microShortGather128_NZ_OFF | 4096 | 197.351 | 571.426 | 2.895480641 GatherOperationsBenchmark.microShortGather256 | 64 | 15255.994 | 34013.422 | 2.22951202 GatherOperationsBenchmark.microShortGather256 | 256 | 3986.306 | 9196.43 | 2.307005533 GatherOperationsBenchmark.microShortGather256 | 1024 | 1003.058 | 2294.437 | 2.287442002 GatherOperationsBenchmark.microShortGather256 | 4096 | 257.45 | 560.259 | 2.176185667 GatherOperationsBenchmark.microShortGather256_MASK | 64 | 17194.868 | 26817.506 | 1.559622673 GatherOperationsBenchmark.microShortGather256_MASK | 256 | 4307.911 | 7252.799 | 1.683600009 GatherOperationsBenchmark.microShortGather256_MASK | 1024 | 1109.034 | 1803.594 | 1.626274758 GatherOperationsBenchmark.microShortGather256_MASK | 4096 | 274.104 | 458.023 | 1.670982547 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 64 | 15615.164 | 25553.091 | 1.636427962 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 256 | 4041.88 | 6826.642 | 1.688976912 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 1024 | 1037.741 | 1706.208 | 1.644155912 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF | 4096 | 257.732 | 439.843 | 1.706590567 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 64 | 14795.183 | 31449.844 | 2.125681311 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 256 | 3931.158 | 8463.181 | 2.15284682 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 1024 | 989.033 | 2120.528 | 2.144041705 GatherOperationsBenchmark.microShortGather256_NZ_OFF | 4096 | 247.472 | 537.454 | 2.171777009 GatherOperationsBenchmark.microShortGather512 | 64 | 17122.803 | 33007.209 | 1.927675568 GatherOperationsBenchmark.microShortGather512 | 256 | 4378.354 | 9043.805 | 2.065571902 GatherOperationsBenchmark.microShortGather512 | 1024 | 1058.899 | 2292.852 | 2.165316994 GatherOperationsBenchmark.microShortGather512 | 4096 | 255.502 | 577.995 | 2.262193642 GatherOperationsBenchmark.microShortGather512_MASK | 64 | 21232.499 | 27840.109 | 1.311202652 GatherOperationsBenchmark.microShortGather512_MASK | 256 | 5520.535 | 8068.078 | 1.461466688 GatherOperationsBenchmark.microShortGather512_MASK | 1024 | 1424.798 | 2058.709 | 1.444912893 GatherOperationsBenchmark.microShortGather512_MASK | 4096 | 355.353 | 472.564 | 1.329843845 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF | 64 | 19155.458 | 25609.616 | 1.336935718 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF | 256 | 4927.708 | 7004.695 | 1.421491493 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF | 1024 | 1275.924 | 1773.667 | 1.390103956 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF | 4096 | 312.012 | 444.827 | 1.425672731 GatherOperationsBenchmark.microShortGather512_NZ_OFF | 64 | 16195.665 | 33555.966 | 2.071910354 GatherOperationsBenchmark.microShortGather512_NZ_OFF | 256 | 4123.235 | 7852.8 | 1.904523996 GatherOperationsBenchmark.microShortGather512_NZ_OFF | 1024 | 1033.009 | 1722.473 | 1.667432714 GatherOperationsBenchmark.microShortGather512_NZ_OFF | 4096 | 256.001 | 441.754 | 1.725594822 GatherOperationsBenchmark.microShortGather64 | 64 | 7717.303 | 34015.02 | 4.40763049 GatherOperationsBenchmark.microShortGather64 | 256 | 1940.575 | 9191.168 | 4.73631166 GatherOperationsBenchmark.microShortGather64 | 1024 | 490.29 | 2294.718 | 4.680327969 GatherOperationsBenchmark.microShortGather64 | 4096 | 117.59 | 579.407 | 4.927349264 GatherOperationsBenchmark.microShortGather64_MASK | 64 | 7325.501 | 26815.558 | 3.660576662 GatherOperationsBenchmark.microShortGather64_MASK | 256 | 1792.717 | 7265.925 | 4.053023985 GatherOperationsBenchmark.microShortGather64_MASK | 1024 | 456.88 | 1805.418 | 3.951624059 GatherOperationsBenchmark.microShortGather64_MASK | 4096 | 109.657 | 454.874 | 4.148152877 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 64 | 7157.769 | 25542.547 | 3.568506751 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 256 | 1763.398 | 6817.439 | 3.866080715 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 1024 | 453.254 | 1704.752 | 3.761140553 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF | 4096 | 108.622 | 439.762 | 4.0485537 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 64 | 7708.116 | 31334.47 | 4.065126939 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 256 | 1954.153 | 8460.249 | 4.329368785 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 1024 | 493.538 | 2120.589 | 4.296708663 GatherOperationsBenchmark.microShortGather64_NZ_OFF | 4096 | 116.976 | 537.472 | 4.594720285 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16354#issuecomment-1778499766 PR Comment: https://git.openjdk.org/jdk/pull/16354#issuecomment-1778500434