Re: RFR: 8333893: Optimization for StringBuilder append boolean & null

Emanuel Peter Mon, 10 Jun 2024 23:33:40 -0700

On Mon, 10 Jun 2024 23:10:05 GMT, Shaojin Wen <d...@openjdk.org> wrote:


>> @wenshao 
>>> I think the performance of the Unsafe branch may be the best data for the 
>>> C2 optimizer. @eme64 can help me see if C2 can do it?
>> 
>> Have you tried to see if the optimization actually was done/taken? You can 
>> use the `TraceMergeStores,` flag. Can you present the generated assembly 
>> code of the benchmarks, and explain the difference based on the generated 
>> assembly code? You can run JMH penchmarks with `perf`. These two blogs may 
>> help you:
>> 
>> http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html
>> https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_jmh_prof_perfasm
>> 
>> @liach I don't think it makes a difference if it is `int` or `byte` 
>> constants. Or what exactly is the code change you are proposing?
>
> @eme64 It seems that when the following code uses StringUTF16.putChar, C2's 
> optimization is not as good as the manual merging and storage effect.
> 
> class AbstractStringBuilder {
>     private AbstractStringBuilder appendNull() {
>       // ...
>               StringUTF16.putCharsAt(val, count, 'n', 'u', 'l', 'l');
>               // ...
>       }
> 
>       public AbstractStringBuilder append(boolean b) {
>           // ...
>               StringUTF16.putCharsAt(val, count, 't', 'r', 'u', 'e');
>               // ...
>               StringUTF16.putCharsAt(val, count, 'f', 'a', 'l', 's', 'e');
>               // ...
>       }
> }
> 
> class StringUTF16 {
>    public static void putCharsAt(byte[] value, int i, char c1, char c2, char 
> c3, char c4) {
>         putChar(value, i    , c1);
>         putChar(value, i + 1, c2);
>         putChar(value, i + 2, c3);
>         putChar(value, i + 3, c4);
>     }
> 
>     @IntrinsicCandidate
>     // intrinsic performs no bounds checks
>     static void putChar(byte[] val, int index, int c) {
>         assert index >= 0 && index < length(val) : "Trusted caller missed 
> bounds check";
>         index <<= 1;
>         val[index++] = (byte)(c >> HI_BYTE_SHIFT);
>         val[index]   = (byte)(c >> LO_BYTE_SHIFT);
>     }
> }
> 
> 
> The code for manually merging storage is as follows, without using Unsafe:
> 
> class AbstractStringBuilder {
>     static final long NULL_UTF16;
>     static final long TRUE_UTF16;
>     static final long FALS_UTF16;
> 
>     static {
>         byte[] bytes = new byte[8];
> 
>         StringUTF16.putCharsAt(bytes, 0, 'n', 'u', 'l', 'l');
>         NULL_UTF16 = getLong(bytes, 0);
> 
>         StringUTF16.putCharsAt(bytes, 0, 't', 'r', 'u', 'e');
>         TRUE_UTF16 = getLong(bytes, 0);
> 
>         StringUTF16.putCharsAt(bytes, 0, 'f', 'a', 'l', 's');
>         FALS_UTF16 = getLong(bytes, 0);
>     }
> 
>     private static long getLong(byte[] bytes, int offset) {
>         return (((long)bytes[offset    ] & 0xff)      ) |
>                (((long)bytes[offset + 1] & 0xff) <<  8) |
>                (((long)bytes[offset + 2] & 0xff) << 16) |
>                (((long)bytes[offset + 3] & 0xff) << 24) |
>                (((long)bytes[offset + 4] & 0xff) << 32) |
>                (((long)bytes[offset + 5] & 0xff) << 40) |
>                (((long)bytes[offset + 6] & 0xff) << 48) |
>                (((long)bytes[offset + 7] & 0xff) << 56);
>     }
> 
>     private static void setLong(byte[] array, int offset, long value) {
>         array[offset]     = (byte)  value;
>         array[offset + 1] = (byte) (value >> 8);
>         array[offset + 2] = (byte) (value >> 16);
>         array[offset + 3] = (byte) (value >> 24);
>         arra...

@wenshao 
> @eme64 It seems that when the following code uses StringUTF16.putChar, C2's 
> optimization is not as good as the manual merging and storage effect.

As I asked above, you will need to provide some evidence / generated assembly / 
perf data, and logs from `TraceMergeStores`. I currently do not have time to 
produce these myself, and I think they would be crucial to determine where the 
missing performance has gone. See my earlier comment:
https://github.com/openjdk/jdk/pull/19626#issuecomment-2158533469

And please also try @cl4es advide here:
https://github.com/openjdk/jdk/pull/19626#issuecomment-2159509806

And sure, maybe you need some public API for setting multiple bytes at once, 
which the `MergeStores` optimization can optimize. I'm a C2 engineer, so I 
leave that up to the library folks ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2159895757

Re: RFR: 8333893: Optimization for StringBuilder append boolean & null

Reply via email to