On Wed, 12 Jun 2024 15:58:59 GMT, Shaojin Wen <d...@openjdk.org> wrote:
>>> @eme64 It seems like MergeStore didn't happen, is there something I did >>> wrong? >> >> Yes ;) >> >> @wenshao The issue is that the pattern matching is quite **limited**. >> `putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4)` >> This has 4 variables. This is not allowed for the optimization. You would >> have to pass in a `long` and split it via shifts. >> A possible exception: if `putCharsAt` is inlined and outside it is a >> splitting of a `long` value with shift, then ... maybe ... this would be >> optimized with `MergeStores`. >> >> There are only 2 **accepted** patterns: >> - Multiple stores of constants -> constant is merged, which allows stores to >> be merged. >> - Multiple stores which all store a "section" of a larger value: >> >> You can see **examples** I listed at the beginning of the PR description in >> https://github.com/openjdk/jdk/pull/16245. > > Thanks to @eme64 's patience and help, I found a way to use MergeStore > without doing boundary checking. > > It would be even better if Unsafe.putChar could be used for MergeStore in the > future. > > ## 1. JavaCode > > class StringUTF16 { > public static void putCharsAt(byte[] value, int i, char c1, char c2, char > c3, char c4) { > // Don't use the putChar method, Its instrinsic will cause C2 unable > to combining values into larger stores. > putChar1(value, i , c1); > putChar1(value, i + 1, c2); > putChar1(value, i + 2, c3); > putChar1(value, i + 3, c4); > } > > public static void putCharsAt(byte[] value, int i, char c1, char c2, char > c3, char c4, char c5) { > // Don't use the putChar method, Its instrinsic will cause C2 unable > to combining values into larger stores. > putChar1(value, i , c1); > putChar1(value, i + 1, c2); > putChar1(value, i + 2, c3); > putChar1(value, i + 3, c4); > putChar1(value, i + 4, c5); > } > > static void putChar1(byte[] value, int i, char c) { > int address = Unsafe.ARRAY_BYTE_BASE_OFFSET + (i << 1); > Unsafe.getUnsafe().putByte(value, address , (byte)(c >> > HI_BYTE_SHIFT)); > Unsafe.getUnsafe().putByte(value, address + 1, (byte)(c >> > LO_BYTE_SHIFT)); > } > } > > > ## 2. Benchmark Numbers > The performance numbers under MacBookPro M1 Max are as follows: > > -Benchmark Mode Cnt Score Error Units > -StringBuilders.toStringCharWithNull8Latin1 avgt 15 7.073 ? 0.010 ns/op > -StringBuilders.toStringCharWithNull8Utf16 avgt 15 9.298 ? 0.015 ns/op > -StringBuilders.toStringCharWithBool8Latin1 avgt 15 7.387 ? 0.021 ns/op > -StringBuilders.toStringCharWithBool8Utf16 avgt 15 9.615 ? 0.011 ns/op > > +Benchmark Mode Cnt Score Error Units > +StringBuilders.toStringCharWithNull8Latin1 avgt 15 5.628 ? 0.017 ns/op > +25.67% > +StringBuilders.toStringCharWithNull8Utf16 avgt 15 6.873 ? 0.026 ns/op > +35.28% > +StringBuilders.toStringCharWithBool8Latin1 avgt 15 6.480 ? 0.077 ns/op > +13.99% > +StringBuilders.toStringCharWithBool8Utf16 avgt 15 7.456 ? 0.059 ns/op > +28.95% > > > ## 3. TraceMergeStores > > [TraceMergeStores]: Replace > 65 StoreB === 54 7 64 12 [[ 87 ]] @byte[int:>=0] > (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; unsafe > Memory: @byte[int:>=0] > (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; !jvms: > StringUTF16::putChar1 @ bci:20 (li... @wenshao I'm glad we figured it out a bit, and I hope you learned a thing or two :) > It would be even better if Unsafe.putChar could be used for MergeStore in the > future. The `_putCharStringU` intrinsic uses `LibraryCallKit::inline_string_char_access`. It seems to generate IR nodes, I speculate it could be `StoreC`. We'd have to do more digging to see why that pattern is not accepted by `MergeStore`. @wenshao I'm not sure if everything is right with the bounds checking, but I leave this to the library folks to review. What you will definately need is benchmarking not just on `M1`, but also `x86-64` and other `aarch64` architectures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2164851377