On Wed, 19 Feb 2025 19:50:50 GMT, Evgeny Astigeevich <eastigeev...@openjdk.org> wrote:
>> I will run a comparison next with the same batch of tests but looking at >> `int` and see if there are any differences compared with `long` or not. > > Hi @galderz, > Results from Graviton 3(Neoverse-V1). > Without the patch: > > Benchmark (probability) (range) (seed) (size) Mode > Cnt Score Error Units > MinMaxVector.intClippingRange N/A 90 0 1000 thrpt > 8 12565.427 ± 37.538 ops/ms > MinMaxVector.intClippingRange N/A 100 0 1000 thrpt > 8 12462.072 ± 84.067 ops/ms > MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt > 8 5113.090 ± 68.720 ops/ms > MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt > 8 5129.857 ± 35.005 ops/ms > MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt > 8 5116.081 ± 8.946 ops/ms > MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt > 8 6174.544 ± 52.573 ops/ms > MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt > 8 6110.884 ± 54.447 ops/ms > MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt > 8 6178.661 ± 48.450 ops/ms > MinMaxVector.intReductionMax 50 N/A N/A 2048 thrpt > 8 5109.270 ± 10.525 ops/ms > MinMaxVector.intReductionMax 80 N/A N/A 2048 thrpt > 8 5123.426 ± 28.229 ops/ms > MinMaxVector.intReductionMax 100 N/A N/A 2048 thrpt > 8 5133.799 ± 7.693 ops/ms > MinMaxVector.intReductionMin 50 N/A N/A 2048 thrpt > 8 5130.209 ± 15.491 ops/ms > MinMaxVector.intReductionMin 80 N/A N/A 2048 thrpt > 8 5127.823 ± 27.767 ops/ms > MinMaxVector.intReductionMin 100 N/A N/A 2048 thrpt > 8 5118.217 ± 22.186 ops/ms > MinMaxVector.longClippingRange N/A 90 0 1000 thrpt > 8 1831.026 ± 15.502 ops/ms > MinMaxVector.longClippingRange N/A 100 0 1000 thrpt > 8 1827.194 ± 22.076 ops/ms > MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt > 8 2643.383 ± 9.830 ops/ms > MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt > 8 2640.417 ± 7.797 ops/ms > MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt > 8 1244.321 ± 1.001 ops/ms > MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt > 8 3239.234 ± 8.813 ops/ms > MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt > 8 3252.713 ± 3... Thanks @eastig for the results on Graviton 3. I'm summarising them here: Benchmark (probability) (range) (seed) (size) Mode Cnt Base Patch Units MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 8 1831.026 5094.259 ops/ms (+178%) MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 8 1827.194 5096.835 ops/ms (+180%) MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 8 2643.383 2636.438 ops/ms MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 8 2640.417 2644.069 ops/ms MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 8 1244.321 2646.250 ops/ms (+112%) MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 8 3239.234 2648.504 ops/ms (-18%) MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 8 3252.713 2658.082 ops/ms (-18%) MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 8 1204.370 2647.532 ops/ms (+119%) MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 8 2536.322 2536.254 ops/ms MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 8 2536.318 2536.209 ops/ms MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 8 1395.273 2536.342 ops/ms (+81%) MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 8 2536.325 2536.271 ops/ms MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 8 2536.265 2536.250 ops/ms MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 8 1389.982 2536.246 ops/ms (+82%) On Graviton 3 there are wide enough registers for vectorization to kick in, so we see similar improvements to x64 AVX-512 in https://github.com/openjdk/jdk/pull/20098#issuecomment-2642788364. There is some variance in the 50/80% probability range, this was also observed slightly there, but on the aarch64 system it looks more pronounced. Interesting that it happened with min but not max but could be variance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2670574593