On Thu, 19 Oct 2023 11:58:26 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> @cl4es >> >>> Good, narrows it down to what's going on in `prepend(long, byte[], >>> String)`. Might boil down to `System.arraycopy`. This method might not be >>> optimized for tiny arrays on all platforms. Specializing for single-char >>> case: >>> >>> ```java >>> diff --git a/src/java.base/share/classes/java/lang/String.java >>> b/src/java.base/share/classes/java/lang/String.java >>> index 9b19d7e2ac1..6eb70925dab 100644 >>> --- a/src/java.base/share/classes/java/lang/String.java >>> +++ b/src/java.base/share/classes/java/lang/String.java >>> @@ -4723,7 +4723,11 @@ static void repeatCopyRest(byte[] buffer, int >>> offset, int limit, int copied) { >>> */ >>> void getBytes(byte[] dst, int dstBegin, byte coder) { >>> if (coder() == coder) { >>> - System.arraycopy(value, 0, dst, dstBegin << coder, >>> value.length); >>> + if (value.length == 1) { >>> + dst[(dstBegin << coder)] = value[0]; >>> + } else { >>> + System.arraycopy(value, 0, dst, dstBegin << coder, >>> value.length); >>> + } >>> } else { // this.coder == LATIN && coder == UTF16 >>> StringLatin1.inflate(value, 0, dst, dstBegin, value.length); >>> } >>> ``` >>> >>> .. seem to help the JIT do the right thing consistently, too: >>> >>> ``` >>> Benchmark Mode Cnt Score Error Units >>> BigDecimals.testSmallToEngineeringString avgt 50 11,757 ± 0,480 ns/op >>> ``` >> >> In addition to #16244, will you submit a PR for this? > >> In addition to #16244, will you submit a PR for this? > > Once both #16244 and this has been integrated I want to revisit this. The > effect of changing `getBytes(byte[], int, byte)` might have disappeared with > #16244 since it better guarantees the JIT will constant fold the prefixes > thoroughly. We've observed related issues with System.arraycopy, however, see > https://bugs.openjdk.org/browse/JDK-8295496 - so I want to evaluate a few > different options here, time allowing. @cl4es I found that StringConcatFactory.makeConcatWithConstants will become slower when using recipe("\1.\1", long.class, long.class). If we treat the prefix of length 1 as char, it will become faster. In this branch https://github.com/wenshao/jdk/tree/optim_decimal_to_string_x2 , if we remove the code that treats the prefix of length 1 as char in the preparer(String prefix, Class<?> cl) method, it will become slower. package java.lang.invoke; class StringConcatFactory { static MethodHandle prepender(String prefix, Class<?> cl) { if (prefix == null || prefix.isEmpty()) { return noPrefixPrepender(cl); } else { // remove this will slower if (prefix.length() == 1 && (cl == char.class || cl == int.class || cl == long.class)) { char ch = prefix.charAt(0); MethodHandle prepend = JLA.stringConcatHelper( "prepend", methodType(long.class, long.class, byte[].class, cl, char.class)).rebind(); return MethodHandles.insertArguments(prepend, 3, ch); } return MethodHandles.insertArguments( prepender(cl), 3, prefix); } } } Here are the performance numbers running on the MaxBook M1 Max: -Benchmark Mode Cnt Score Error Units (String prefix) -BigDecimals.smallScale3ToPlainString avgt 15 28.898 ? 0.527 ns/op +Benchmark Mode Cnt Score Error Units (char prefix) +BigDecimals.smallScale3ToPlainString avgt 15 23.182 ? 0.477 ns/op Maybe you should submit another PR to improve StringConcatFactory.makeConcatWithConstants ------------- PR Comment: https://git.openjdk.org/jdk/pull/16006#issuecomment-1773768539