On Fri, 20 Jan 2023 16:47:27 GMT, Glavo <d...@openjdk.org> wrote: > This is the javadoc of `JavaLangAccess::newStringNoRepl`: > > > /** > * Constructs a new {@code String} by decoding the specified subarray of > * bytes using the specified {@linkplain java.nio.charset.Charset > charset}. > * > * The caller of this method shall relinquish and transfer the ownership > of > * the byte array to the callee since the later will not make a copy. > * > * @param bytes the byte array source > * @param cs the Charset > * @return the newly created string > * @throws CharacterCodingException for malformed or unmappable bytes > */ > > > It is recorded in the document that it should be able to directly construct > strings with parameter byte array to reduce array allocation. > > However, at present, `newStringNoRepl` always copies arrays for UTF-8 or > other ASCII compatible charsets. > > This PR fixes this problem.
I ran the tier1 and tier2 tests, and there were no new errors. The only use case affected is `Files.readString`. I tested the performance of `readString` based on the memory file system. baseline: Benchmark (length) Mode Cnt Score Error Units NoRepl.testReadAscii 0 thrpt 5 5049760.744 ± 3563.324 ops/s NoRepl.testReadAscii 1024 thrpt 5 3523083.785 ± 23747.078 ops/s NoRepl.testReadAscii 8192 thrpt 5 2415952.140 ± 85884.289 ops/s NoRepl.testReadAscii 1048576 thrpt 5 32425.563 ± 284.121 ops/s NoRepl.testReadAscii 33554432 thrpt 5 872.492 ± 4.311 ops/s NoRepl.testReadAscii 268435456 thrpt 5 58.736 ± 0.224 ops/s NoRepl.testReadAsciiAsGBK 0 thrpt 5 4229547.832 ± 10997.381 ops/s NoRepl.testReadAsciiAsGBK 1024 thrpt 5 3409472.580 ± 819.566 ops/s NoRepl.testReadAsciiAsGBK 8192 thrpt 5 2179865.886 ± 16606.862 ops/s NoRepl.testReadAsciiAsGBK 1048576 thrpt 5 32962.429 ± 249.385 ops/s NoRepl.testReadAsciiAsGBK 33554432 thrpt 5 871.810 ± 1.812 ops/s NoRepl.testReadAsciiAsGBK 268435456 thrpt 5 59.131 ± 0.172 ops/s NoRepl.testReadGBK 0 thrpt 5 4657464.074 ± 51849.667 ops/s NoRepl.testReadGBK 1024 thrpt 5 1199083.242 ± 8653.846 ops/s NoRepl.testReadGBK 8192 thrpt 5 75823.949 ± 46.493 ops/s NoRepl.testReadGBK 1048576 thrpt 5 675.214 ± 1.729 ops/s NoRepl.testReadGBK 33554432 thrpt 5 20.972 ± 2.280 ops/s NoRepl.testReadGBK 268435456 thrpt 5 2.585 ± 0.394 ops/s NoRepl.testReadUTF8 0 thrpt 5 5222403.558 ± 44728.740 ops/s NoRepl.testReadUTF8 1024 thrpt 5 1417472.161 ± 23776.534 ops/s NoRepl.testReadUTF8 8192 thrpt 5 185714.265 ± 2096.328 ops/s NoRepl.testReadUTF8 1048576 thrpt 5 1318.763 ± 16.051 ops/s NoRepl.testReadUTF8 33554432 thrpt 5 30.663 ± 0.114 ops/s NoRepl.testReadUTF8 268435456 thrpt 5 3.782 ± 0.041 ops/s This PR: Benchmark (length) Mode Cnt Score Error Units NoRepl.testReadAscii 0 thrpt 5 5084610.141 ± 37449.826 ops/s NoRepl.testReadAscii 1024 thrpt 5 3425713.961 ± 24060.542 ops/s NoRepl.testReadAscii 8192 thrpt 5 2765684.248 ± 20586.103 ops/s NoRepl.testReadAscii 1048576 thrpt 5 48074.603 ± 371.213 ops/s NoRepl.testReadAscii 33554432 thrpt 5 1167.878 ± 14.427 ops/s NoRepl.testReadAscii 268435456 thrpt 5 71.028 ± 0.439 ops/s NoRepl.testReadAsciiAsGBK 0 thrpt 5 4783174.805 ± 9789.109 ops/s NoRepl.testReadAsciiAsGBK 1024 thrpt 5 3518265.840 ± 18467.577 ops/s NoRepl.testReadAsciiAsGBK 8192 thrpt 5 2775108.822 ± 19282.776 ops/s NoRepl.testReadAsciiAsGBK 1048576 thrpt 5 46956.963 ± 147.593 ops/s NoRepl.testReadAsciiAsGBK 33554432 thrpt 5 1165.036 ± 10.032 ops/s NoRepl.testReadAsciiAsGBK 268435456 thrpt 5 70.878 ± 0.191 ops/s NoRepl.testReadGBK 0 thrpt 5 4910043.054 ± 27295.344 ops/s NoRepl.testReadGBK 1024 thrpt 5 1177675.970 ± 15573.239 ops/s NoRepl.testReadGBK 8192 thrpt 5 75417.479 ± 233.957 ops/s NoRepl.testReadGBK 1048576 thrpt 5 674.620 ± 5.856 ops/s NoRepl.testReadGBK 33554432 thrpt 5 19.899 ± 1.504 ops/s NoRepl.testReadGBK 268435456 thrpt 5 2.705 ± 0.002 ops/s NoRepl.testReadUTF8 0 thrpt 5 4851516.950 ± 9237.743 ops/s NoRepl.testReadUTF8 1024 thrpt 5 1332016.420 ± 9570.465 ops/s NoRepl.testReadUTF8 8192 thrpt 5 184177.766 ± 4662.562 ops/s NoRepl.testReadUTF8 1048576 thrpt 5 1326.439 ± 3.420 ops/s NoRepl.testReadUTF8 33554432 thrpt 5 30.782 ± 0.116 ops/s NoRepl.testReadUTF8 268435456 thrpt 5 3.790 ± 0.011 ops/s When reading an ASCII file as UTF-8 or GBK encoding, we can see that the throughput of `readString` has improved significantly. When reading ASCII files with a size of 1MiB, the throughput increased by 40%~50%, but for larger or smaller files, the throughput improvement will be smaller. For files containing non-ASCII characters, the throughput of `readString` is between 94% and 104% of the baseline. This is the source code of the benchmark: https://gist.github.com/Glavo/f3d2060d0bd13cd0ce2add70e6060ea0 Can someone help me open an Issue on Java Bug System? The throughput of reading ASCII files as UTF-8:  The throughput of reading ASCII files as GBK:  > /issue JDK-8299807 Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/12119