On Wed, 20 Sep 2023 09:12:48 GMT, Claes Redestad <redes...@openjdk.org> wrote:
> This patch reverts the use of `ByteArrayLittleEndian` in `StringLatin1`. > > This use is the cause of a small (~1.5ms) startup regression in 22-b15. While > a manageable startup regression in and of itself, the use of `VarHandles` in > core utility classes brings an increased risk of bootstrap circularity > issues, for example disqualifying the use of things like `Integers.toString` > in some places. > > Reverting this partially rolls back the performance improvement gained by > JDK-8310929. It seems reasonable that the compiler can be enhanced to gain > that loss back. This PR vs a 22-b15 baseline: Name Cnt Base Error Test Error Unit Diff% Integers.toStringBig 15 5,318 ± 0,043 6,628 ± 0,127 us/op -24,6% (p = 0,000*) Integers.toStringSmall 15 3,202 ± 0,018 3,562 ± 0,027 us/op -11,2% (p = 0,000*) Integers.toStringTiny 15 2,286 ± 0,017 2,352 ± 0,024 us/op -2,9% (p = 0,000*) * = significant This PR vs a 22-b14 baseline: Name Cnt Base Error Test Error Unit Diff% Integers.toStringBig 15 12,313 ± 0,143 6,628 ± 0,127 us/op 46,2% (p = 0,000*) Integers.toStringSmall 15 4,816 ± 0,074 3,562 ± 0,027 us/op 26,0% (p = 0,000*) Integers.toStringTiny 15 2,611 ± 0,022 2,352 ± 0,024 us/op 9,9% (p = 0,000*) * = significant There's still a substantial win compared to 22-b14, stemming from the use of a packed lookup table rather than two disjoint tables for tens and single digit numbers. Startup numbers improve with the above patch to levels on par with 22-b14: Name Cnt Base Error Test Error Unit Diff% Perfstartup-Noop-G1 20 30,000 ± 0,000 28,500 ± 3,181 ms/op 5,0% (p = 0,083 ) :.cycles 20 88166516,750 ± 2119868,114 84226439,550 ± 1792195,203 cycles -4,5% (p = 0,000*) :.instructions 20 204321816,400 ± 248867,819 195313416,200 ± 196361,902 instructions -4,4% (p = 0,000*) :.taskclock 20 12,000 ± 4,543 10,000 ± 0,000 ms -16,7% (p = 0,104 ) * = significant (This is simply a Noop/Hello World program in a loop, with stats collected by `/usr/bin/time -l`, run on a MacBook M1) FWIW when initializing `DIGITS` directly (`DIGITS = new byte[] { ...`) the `DecimalDigits` class is 2610 bytes, with the for loop in a `static` block it drops down to 2112 bytes. Array constants like this generate sad and bloated bytecode: 0: bipush 100 2: newarray short 4: dup 5: iconst_0 6: sipush 12336 9: sastore ... 40: dup 41: bipush 6 43: sipush 13872 46: sastore ... 691: dup 692: bipush 99 694: sipush 14649 697: sastore 698: putstatic #13 // Field DIGITS:[S ------------- PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727317896 PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727402036 PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727935701