On Tue, 7 Feb 2023 20:32:11 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/String.java line 698: >> >>> 696: } >>> 697: >>> 698: static byte[] copyBytes(byte[] bytes, int offset, int length) { >> >> Given that the stub generated for array copy seems highly dependent by the >> call site constrains, did you tried adding a check for offset == 0 and/or >> length == bytes.length? >> >> If (offset == 0 && bytes.length == length) { >> System.arrayCopy(bytes, 0, dst, 0, bytes.length); >> // etc etc the other combinations >> >> This should have different generated stubs with much smaller ASM depending >> by the enforced constrains (and shouldn't affect terribly the code size of >> the method, given that the stub won't be inlined AFAIK) >> >> Beware, as noted by others, I'm not suggesting that's the way to fix this, >> but it would be interesting to check how much perf we leave on the ground >> due to the this supposed "inefficient" stub generation (if that's the issue). > > I did some quick experiments but saw no clear win from doing anything like > this here. Feel free to experiment and see if there's some particular > configuration that comes out ahead. > > FTR I did not intend for this RFE to solve > https://bugs.openjdk.org/browse/JDK-8295496 completely, but provide a small, > partial win that might possibly clear a path to solving that likely > orthogonal issue. I've created a separate benchmark for this (named as your by accident - given that I've used it as a blueprint): https://gist.github.com/franz1981/658c2bf6796aab4ae04a84bef1ef34b6 results are Benchmark (offset) (size) Mode Cnt Score Error Units StringConstructor.arrayCopy 0 7 avgt 10 9.519 ± 0.131 ns/op StringConstructor.arrayCopy 1 7 avgt 10 9.194 ± 0.232 ns/op StringConstructor.copyOf 0 7 avgt 10 11.548 ± 0.133 ns/op StringConstructor.copyOf 1 7 avgt 10 9.812 ± 0.018 ns/op StringConstructor.optimizedArrayCopy 0 7 avgt 10 6.854 ± 0.355 ns/op <---- THAT'S COOL StringConstructor.optimizedArrayCopy 1 7 avgt 10 9.088 ± 0.049 ns/op the optimized array copy is helping C2 on stub generation. I didn't checked yet if this applies to the `String` case and I didn't created a long enough dataset array to check the effects on the branch predictor with the newly introduced conditions too, but in term of generated stub, there's a difference. ------------- PR: https://git.openjdk.org/jdk/pull/12453