On Sat, 29 Oct 2022 12:31:41 GMT, Markus KARG <d...@openjdk.org> wrote:
>> This PR implements JDK-8294696. > > Markus KARG has updated the pull request incrementally with one additional > commit since the last revision: > > No immediate buffer resize; in the rare case of read-past-EOF automatic > buffer grow will happen anyways **TL;DR:** Your optimization is now contained in this PR and outperforms the previous code. Nevertheless I plead for Alternative C (double buffering) instead, which is even faster but much simpler. **Justification** I took some time and did intensive JMH testing (with two buffer sizes and with four percental start positions within the buffer to simulate non-pre-read, slightly-pre-read, mostly-pre-read and fully-pre-read buffer situations) with three variants: The current code with early buffer resize (which you say is not necessary), the current code without early buffer resize (which you propose instead), and the so-called Alternative C (which uses double-buffering instead of buffer-replacing). * Alternative B: Optimization outperforms unoptimized code The results are shown below, and as you can see, early buffer resize is only a problem in *some*, but not all, cases. Nevertheless I agree that should be avoided, so I already pushed it as commit https://github.com/openjdk/jdk/pull/10525/commits/9c2f502c06083694deefe6a9d4c8ee4b77a92c68 and hope that you are fine with this latest piece of code. * Alternative C: Outperforms Alternative B (optimized *and* unoptimized code) unless buffer is huge Interesting to see BTW is that Alternative C (double-buffering instead of buffer-replacing) even outperforms that optimized code for the default buffer size (8K), and is only slightly slower in the case of large (1M) buffers (apparently `System.arraycopy` is faster than `Arrays.fill` unless the array is huge, which I did not expect originally). As the code is super-simple compared to the current code, and as the originally assumed risk of OOME does not exists (if it would, the `fill()` method would fail already, as it uses double-buffering, too, and is called by the `read` happening *before* `transferTo`) I would propose that we go with this code instead of the optimized Alternative B: private long implTransferTo(OutputStream out) throws IOException { if (getClass() == BufferedInputStream.class && markpos == -1) { int avail = count - pos; if (avail > 0) { byte[] buffer = new byte[avail]; // Double buffering prevents poisoning and leaking System.arraycopy(getBufIfOpen(), pos, buffer, 0, avail); out.write(buffer); pos = count; } return avail + getInIfOpen().transferTo(out); } else { return super.transferTo(out); } } Here are the numbers in full length for your reference: Alternative B optimized (no early buffer resize) Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4289,949 ┬▒ 217,452 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3823,443 ┬▒ 314,171 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 4122,949 ┬▒ 52,131 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4675,491 ┬▒ 46,194 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3477,497 ┬▒ 111,229 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4767,220 ┬▒ 18,274 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 43900,923 ┬▒ 799,471 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1680163,414 ┬▒ 29823,617 ops/s Alternative B original (with early buffer resize) Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4346,835 ┬▒ 273,140 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3816,744 ┬▒ 220,669 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 3624,021 ┬▒ 340,279 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4484,060 ┬▒ 95,201 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3507,334 ┬▒ 140,604 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4800,050 ┬▒ 44,987 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 44327,841 ┬▒ 256,322 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1699685,163 ┬▒ 44257,421 ops/s Alternative C: Double buffering Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4555,523 ┬▒ 83,370 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3972,562 ┬▒ 489,380 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 4181,716 ┬▒ 49,527 ops/s BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4602,409 ┬▒ 125,641 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3373,734 ┬▒ 161,989 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4684,583 ┬▒ 87,257 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 44082,232 ┬▒ 364,210 ops/s BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1690855,778 ┬▒ 53885,439 ops/s ------------- PR: https://git.openjdk.org/jdk/pull/10525