On Tue, 14 Jan 2025 19:14:05 GMT, Johannes Graham <d...@openjdk.org> wrote:

>> The new implementation improves performance on the aarch64 architecture but 
>> results in a performance regression on x64.
>> 
>> ## 1. Script
>> 
>> git remote add wenshao g...@github.com:wenshao/jdk.git
>> git fetch wenshao
>> 
>> # baseline dfaa89162a3
>> git checkout dfaa89162a35acd20b1ed35e147f9626a181510a
>> make test TEST="micro:java.util.UUIDBench.toString"
>> 
>>  # current c513087056b
>> git checkout c513087056be8c1e1a915625e0b425a7ecbb21d6
>> make test TEST="micro:java.util.UUIDBench.toString"
>> 
>> 
>> ## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)
>> 
>> -Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline 
>> dfaa89162a3)
>> -UUIDBench.toString   20000  thrpt   15  94.274 ± 0.452  ops/us
>> 
>> +Benchmark           (size)   Mode  Cnt   Score   Error   Units (current 
>> c513087056b)
>> +UUIDBench.toString   20000  thrpt   15  80.241 ± 0.894  ops/us -14.88%
>> 
>> 
>> 
>> ## 3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)
>> 
>> -Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline 
>> dfaa89162a3)
>> -UUIDBench.toString   20000  thrpt   15  85.323 ± 2.044  ops/us
>> 
>> +Benchmark           (size)   Mode  Cnt   Score   Error   Units (current 
>> c513087056b)
>> +UUIDBench.toString   20000  thrpt   15  73.636 ± 0.590  ops/us -13.69%
>> 
>> 
>> ## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)
>> 
>> -Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline 
>> dfaa89162a3)
>> -UUIDBench.toString   20000  thrpt   15  69.286 ± 1.136  ops/us
>> 
>> +Benchmark           (size)   Mode  Cnt   Score   Error   Units (current 
>> c513087056b)
>> +UUIDBench.toString   20000  thrpt   15  80.475 ± 0.310  ops/us +16.14%
>> 
>> 
>> 
>> ## 5. MacBook M1 Pro (aarch64)
>> 
>> -Benchmark           (size)   Mode  Cnt    Score   Error   Units (baseline 
>> dfaa89162a3)
>> -UUIDBench.toString   20000  thrpt   15  108.254 ? 1.167  ops/us
>> 
>> +Benchmark           (size)   Mode  Cnt    Score   Error   Units (current 
>> c513087056b)
>> +UUIDBench.toString   20000  thrpt   15  122.313 ? 0.820  ops/us +12.98%
>> 
>> 
>> 
>> ## 6. orange_pi5_aarch64 (CPU RK3588S)
>> 
>> -Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline 
>> dfaa89162a3)
>> -UUIDBench.toString   20000  thrpt   15  37.783 ± 1.553  ops/us
>> 
>> +Benchmark           (size)   Mode  Cnt   Score   Error   Units (current 
>> c513087056b)
>> +UUIDBench.toString   20000  thrpt   15  42.928 ± 2.534  ops/us +13.61%
>> 
>> 
>> 
>> 
>> ## 7. orange_aipro_aarch64 (CPU TAISHANV200M)
>> 
>> -Benchmark           (size)   Mode  Cnt   Sco...
>
> With regard to the aarch64 vector instrinsic, I don't have access to an 
> aarch64 to try it on (I'm faking it x64 by disabling the intrinsic). @wenshao 
> would it be possible for you to try the Long.expand version of this patch 
> with the patch from https://github.com/openjdk/jdk/pull/23089 to see how 
> aarch64 performs?

@j3graham
 Based on PR 23089, there has been a noticeable performance improvement in 
xor_const, except on AWS C7g (AArch64) machines.

## 1. Script

git remote add wenshao g...@github.com:wenshao/jdk.git
git fetch wenshao

# baseline dfaa89162a3
git checkout dfaa89162a35acd20b1ed35e147f9626a181510a
make test TEST="micro:java.util.UUIDBench.toString"

 # current c513087056b
git checkout c513087056be8c1e1a915625e0b425a7ecbb21d6
make test TEST="micro:java.util.UUIDBench.toString"

# xor_const + Long.expand 4f54ac68a9f
git checkout 4f54ac68a9fdb635ea2a3f03787cbf0d052dac25
make test TEST="micro:java.util.UUIDBench.toString"


## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)

Benchmark           (size)   Mode  Cnt   Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  94.273 ± 0.196  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  79.701 ± 0.979  ops/us

Benchmark           (size)   Mode  Cnt    Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  131.954 ± 1.005  ops/us



## 3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)

Benchmark           (size)   Mode  Cnt    Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  110.221 ± 4.370  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  78.233 ± 0.790  ops/us

Benchmark           (size)   Mode  Cnt    Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  136.119 ± 0.464  ops/us


## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710 ARM v9)

Benchmark           (size)   Mode  Cnt   Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  70.538 ± 0.095  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  80.501 ± 0.280  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  93.289 ± 0.665  ops/us



## 5. MacBook M1 Pro (aarch64)

Benchmark           (size)   Mode  Cnt    Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  106.552 ? 0.856  ops/us

Benchmark           (size)   Mode  Cnt    Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  120.775 ? 0.755  ops/us

Benchmark           (size)   Mode  Cnt    Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  121.762 ? 0.826  ops/us



## 6. orange_pi5_aarch64 (CPU RK3588S ARMv8.4)

Benchmark           (size)   Mode  Cnt   Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  37.314 ± 1.616  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  43.791 ± 2.181  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  43.906 ± 1.287  ops/us




## 7. aws_c7g_aarch64 (CPU Graviton3 ARMv8.4)

Benchmark           (size)   Mode  Cnt   Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  65.280 ± 0.742  ops/us


Benchmark           (size)   Mode  Cnt   Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  59.123 ± 0.338  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  58.846 ± 0.729  ops/us



## 8. aws_c8g_aarch64 (CPU Graviton4 ARM v9.0)

Benchmark           (size)   Mode  Cnt   Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  81.226 ± 0.374  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  87.328 ± 1.086  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  93.546 ± 1.623  ops/us



## 9. orange_aipro_aarch64 (CPU TAISHANV200M)


Benchmark           (size)   Mode  Cnt   Score   Error   Units (dfaa89162a3)
UUIDBench.toString   20000  thrpt   15  13.828 ± 0.142  ops/us


Benchmark           (size)   Mode  Cnt   Score   Error   Units (c513087056b)
UUIDBench.toString   20000  thrpt   15  18.870 ± 0.251  ops/us

Benchmark           (size)   Mode  Cnt   Score   Error   Units (4f54ac68a9f)
UUIDBench.toString   20000  thrpt   15  18.833 ± 0.192  ops/us

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22928#issuecomment-2593333971

Reply via email to