On Fri, 27 Jan 2023 16:04:41 GMT, Roger Riggs <rri...@openjdk.org> wrote:
>> This is the javadoc of `JavaLangAccess::newStringNoRepl`: >> >> >> /** >> * Constructs a new {@code String} by decoding the specified subarray of >> * bytes using the specified {@linkplain java.nio.charset.Charset >> charset}. >> * >> * The caller of this method shall relinquish and transfer the ownership >> of >> * the byte array to the callee since the later will not make a copy. >> * >> * @param bytes the byte array source >> * @param cs the Charset >> * @return the newly created string >> * @throws CharacterCodingException for malformed or unmappable bytes >> */ >> >> >> It is recorded in the document that it should be able to directly construct >> strings with parameter byte array to reduce array allocation. >> >> However, at present, `newStringNoRepl` always copies arrays for UTF-8 or >> other ASCII compatible charsets. >> >> This PR fixes this problem. > > It seems odd that the benchmark seems slower for smaller files; can you > suggest why that might be? > I'd expect the size distribution for Files.readString to be biased toward the > smaller files. > Can you repeat the benchmark using the default file system. OS file caching > should eliminate the disk speed effects. @RogerRiggs I rerun benchmark based on the default file system, and the test file size is between 0 and 32KiB. The throughput of reading ASCII files as UTF-8:  The throughput of reading ASCII files as GBK:  The performance has been slightly improved, and there is no performance degradation. For UTF-8 and GBK files with non-ASCII characters, the throughput fluctuates by no more than 4%. Test code and original results: https://gist.github.com/Glavo/f3d2060d0bd13cd0ce2add70e6060ea0?permalink_comment_id=4451350#gistcomment-4451350 > It seems odd that the benchmark seems slower for smaller files; can you > suggest why that might be? The most likely reason is the cost of the newly added if judgment in newStringUTF8NoRepl. I don't think this is an important issue, because when it comes to actual I/O operations, its impact is negligible. The main purpose of this PR is to eliminate unnecessary temporary memory allocation, thus reducing GC pressure. The change in throughput is only a by-product. ------------- PR: https://git.openjdk.org/jdk/pull/12119