On Tue, 14 Jan 2025 19:14:05 GMT, Johannes Graham <d...@openjdk.org> wrote:
>> The new implementation improves performance on the aarch64 architecture but >> results in a performance regression on x64. >> >> ## 1. Script >> >> git remote add wenshao g...@github.com:wenshao/jdk.git >> git fetch wenshao >> >> # baseline dfaa89162a3 >> git checkout dfaa89162a35acd20b1ed35e147f9626a181510a >> make test TEST="micro:java.util.UUIDBench.toString" >> >> # current c513087056b >> git checkout c513087056be8c1e1a915625e0b425a7ecbb21d6 >> make test TEST="micro:java.util.UUIDBench.toString" >> >> >> ## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa) >> >> -Benchmark (size) Mode Cnt Score Error Units (baseline >> dfaa89162a3) >> -UUIDBench.toString 20000 thrpt 15 94.274 ± 0.452 ops/us >> >> +Benchmark (size) Mode Cnt Score Error Units (current >> c513087056b) >> +UUIDBench.toString 20000 thrpt 15 80.241 ± 0.894 ops/us -14.88% >> >> >> >> ## 3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids) >> >> -Benchmark (size) Mode Cnt Score Error Units (baseline >> dfaa89162a3) >> -UUIDBench.toString 20000 thrpt 15 85.323 ± 2.044 ops/us >> >> +Benchmark (size) Mode Cnt Score Error Units (current >> c513087056b) >> +UUIDBench.toString 20000 thrpt 15 73.636 ± 0.590 ops/us -13.69% >> >> >> ## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710) >> >> -Benchmark (size) Mode Cnt Score Error Units (baseline >> dfaa89162a3) >> -UUIDBench.toString 20000 thrpt 15 69.286 ± 1.136 ops/us >> >> +Benchmark (size) Mode Cnt Score Error Units (current >> c513087056b) >> +UUIDBench.toString 20000 thrpt 15 80.475 ± 0.310 ops/us +16.14% >> >> >> >> ## 5. MacBook M1 Pro (aarch64) >> >> -Benchmark (size) Mode Cnt Score Error Units (baseline >> dfaa89162a3) >> -UUIDBench.toString 20000 thrpt 15 108.254 ? 1.167 ops/us >> >> +Benchmark (size) Mode Cnt Score Error Units (current >> c513087056b) >> +UUIDBench.toString 20000 thrpt 15 122.313 ? 0.820 ops/us +12.98% >> >> >> >> ## 6. orange_pi5_aarch64 (CPU RK3588S) >> >> -Benchmark (size) Mode Cnt Score Error Units (baseline >> dfaa89162a3) >> -UUIDBench.toString 20000 thrpt 15 37.783 ± 1.553 ops/us >> >> +Benchmark (size) Mode Cnt Score Error Units (current >> c513087056b) >> +UUIDBench.toString 20000 thrpt 15 42.928 ± 2.534 ops/us +13.61% >> >> >> >> >> ## 7. orange_aipro_aarch64 (CPU TAISHANV200M) >> >> -Benchmark (size) Mode Cnt Sco... > > With regard to the aarch64 vector instrinsic, I don't have access to an > aarch64 to try it on (I'm faking it x64 by disabling the intrinsic). @wenshao > would it be possible for you to try the Long.expand version of this patch > with the patch from https://github.com/openjdk/jdk/pull/23089 to see how > aarch64 performs? @j3graham Based on PR 23089, there has been a noticeable performance improvement in xor_const, except on AWS C7g (AArch64) machines. ## 1. Script git remote add wenshao g...@github.com:wenshao/jdk.git git fetch wenshao # baseline dfaa89162a3 git checkout dfaa89162a35acd20b1ed35e147f9626a181510a make test TEST="micro:java.util.UUIDBench.toString" # current c513087056b git checkout c513087056be8c1e1a915625e0b425a7ecbb21d6 make test TEST="micro:java.util.UUIDBench.toString" # xor_const + Long.expand 4f54ac68a9f git checkout 4f54ac68a9fdb635ea2a3f03787cbf0d052dac25 make test TEST="micro:java.util.UUIDBench.toString" ## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 94.273 ± 0.196 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 79.701 ± 0.979 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 131.954 ± 1.005 ops/us ## 3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 110.221 ± 4.370 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 78.233 ± 0.790 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 136.119 ± 0.464 ops/us ## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710 ARM v9) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 70.538 ± 0.095 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 80.501 ± 0.280 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 93.289 ± 0.665 ops/us ## 5. MacBook M1 Pro (aarch64) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 106.552 ? 0.856 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 120.775 ? 0.755 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 121.762 ? 0.826 ops/us ## 6. orange_pi5_aarch64 (CPU RK3588S ARMv8.4) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 37.314 ± 1.616 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 43.791 ± 2.181 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 43.906 ± 1.287 ops/us ## 7. aws_c7g_aarch64 (CPU Graviton3 ARMv8.4) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 65.280 ± 0.742 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 59.123 ± 0.338 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 58.846 ± 0.729 ops/us ## 8. aws_c8g_aarch64 (CPU Graviton4 ARM v9.0) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 81.226 ± 0.374 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 87.328 ± 1.086 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 93.546 ± 1.623 ops/us ## 9. orange_aipro_aarch64 (CPU TAISHANV200M) Benchmark (size) Mode Cnt Score Error Units (dfaa89162a3) UUIDBench.toString 20000 thrpt 15 13.828 ± 0.142 ops/us Benchmark (size) Mode Cnt Score Error Units (c513087056b) UUIDBench.toString 20000 thrpt 15 18.870 ± 0.251 ops/us Benchmark (size) Mode Cnt Score Error Units (4f54ac68a9f) UUIDBench.toString 20000 thrpt 15 18.833 ± 0.192 ops/us ------------- PR Comment: https://git.openjdk.org/jdk/pull/22928#issuecomment-2593333971