On Mon, 31 Oct 2022 09:52:06 GMT, Alan Bateman <al...@openjdk.org> wrote:

>> **TL;DR:** Your optimization is now contained in this PR and outperforms the 
>> previous code. Nevertheless I plead for Alternative C (double buffering) 
>> instead, which is even faster but much simpler.
>> 
>> **Justification**
>> 
>> I took some time and did intensive JMH testing (with two buffer sizes and 
>> with four percental start positions within the buffer to simulate 
>> non-pre-read, slightly-pre-read, mostly-pre-read and fully-pre-read buffer 
>> situations) with three variants: The current code with early buffer resize 
>> (which you say is not necessary), the current code without early buffer 
>> resize (which you propose instead), and the so-called Alternative C (which 
>> uses double-buffering instead of buffer-replacing).
>> 
>> * Alternative B: Optimization outperforms unoptimized code
>> The results are shown below, and as you can see, early buffer resize is only 
>> a problem in *some*, but not all, cases. Nevertheless I agree that should be 
>> avoided, so I already pushed it as commit 
>> https://github.com/openjdk/jdk/pull/10525/commits/9c2f502c06083694deefe6a9d4c8ee4b77a92c68
>>  and hope that you are fine with this latest piece of code.
>> 
>> * Alternative C: Outperforms Alternative B (optimized *and* unoptimized 
>> code) unless buffer is huge
>> Interesting to see BTW is that Alternative C (double-buffering instead of 
>> buffer-replacing) even outperforms that optimized code for the default 
>> buffer size (8K), and is only slightly slower in the case of large (1M) 
>> buffers (apparently `System.arraycopy` is faster than `Arrays.fill` unless 
>> the array is huge, which I did not expect originally). As the code is 
>> super-simple compared to the current code, and as the originally assumed 
>> risk of OOME does not exists (if it would, the `fill()` method would fail 
>> already, as it uses double-buffering, too, and is called by the `read` 
>> happening *before* `transferTo`) I would propose that we go with this code 
>> instead of the optimized Alternative B:
>> 
>> private long implTransferTo(OutputStream out) throws IOException {
>>     if (getClass() == BufferedInputStream.class && markpos == -1) {
>>         int avail = count - pos;
>>         if (avail > 0) {
>>             byte[] buffer = new byte[avail]; // Double buffering prevents 
>> poisoning and leaking
>>             System.arraycopy(getBufIfOpen(), pos, buffer, 0, avail);
>>             out.write(buffer);
>>             pos = count;
>>         }
>>         return avail + getInIfOpen().transferTo(out);
>>     } else {
>>         return super.transferTo(out);
>>     }
>> }
>> 
>> 
>> Here are the numbers in full length for your reference:
>> 
>> Alternative B optimized (no early buffer resize)
>> 
>> Benchmark                                            (bufSize)  (posPercent) 
>>   Mode  Cnt        Score       Error  Units
>> BufferedInputStreamTransferToPerformance.transferTo       8192             0 
>>  thrpt   25     4289,949 ┬▒   217,452  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192            10 
>>  thrpt   25     3823,443 ┬▒   314,171  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192            90 
>>  thrpt   25     4122,949 ┬▒    52,131  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192           100 
>>  thrpt   25     4675,491 ┬▒    46,194  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576             0 
>>  thrpt   25     3477,497 ┬▒   111,229  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576            10 
>>  thrpt   25     4767,220 ┬▒    18,274  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576            90 
>>  thrpt   25    43900,923 ┬▒   799,471  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576           100 
>>  thrpt   25  1680163,414 ┬▒ 29823,617  ops/s
>> 
>> Alternative B original (with early buffer resize)
>> 
>> Benchmark                                            (bufSize)  (posPercent) 
>>   Mode  Cnt        Score       Error  Units
>> BufferedInputStreamTransferToPerformance.transferTo       8192             0 
>>  thrpt   25     4346,835 ┬▒   273,140  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192            10 
>>  thrpt   25     3816,744 ┬▒   220,669  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192            90 
>>  thrpt   25     3624,021 ┬▒   340,279  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192           100 
>>  thrpt   25     4484,060 ┬▒    95,201  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576             0 
>>  thrpt   25     3507,334 ┬▒   140,604  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576            10 
>>  thrpt   25     4800,050 ┬▒    44,987  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576            90 
>>  thrpt   25    44327,841 ┬▒   256,322  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576           100 
>>  thrpt   25  1699685,163 ┬▒ 44257,421  ops/s
>> 
>> Alternative C: Double buffering
>> 
>> Benchmark                                            (bufSize)  (posPercent) 
>>   Mode  Cnt        Score       Error  Units
>> BufferedInputStreamTransferToPerformance.transferTo       8192             0 
>>  thrpt   25     4555,523 ┬▒    83,370  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192            10 
>>  thrpt   25     3972,562 ┬▒   489,380  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192            90 
>>  thrpt   25     4181,716 ┬▒    49,527  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo       8192           100 
>>  thrpt   25     4602,409 ┬▒   125,641  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576             0 
>>  thrpt   25     3373,734 ┬▒   161,989  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576            10 
>>  thrpt   25     4684,583 ┬▒    87,257  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576            90 
>>  thrpt   25    44082,232 ┬▒   364,210  ops/s
>> BufferedInputStreamTransferToPerformance.transferTo    1048576           100 
>>  thrpt   25  1690855,778 ┬▒ 53885,439  ops/s
>> 
>> (N.B.: Huge buffers appear to have better numbers than small buffers simply 
>> because the test always transported 1 MB of data, so with a huge buffer this 
>> happens in few steps, while with small buffers this happens in many steps.)
>
>> **TL;DR:** Your optimization is now contained in this PR and outperforms the 
>> previous code. Nevertheless I plead for Alternative C (double buffering) 
>> instead, which is even faster but much simpler.
> 
> You had dismissed this previously so it's useful to have results now to 
> compare. I agree it's much simpler, it can probably use Arrays.copyOfRange 
> too.
> 
> For the results, can you say what the underlying input stream is and what the 
> target output stream is? Also it would be useful to get a baseline. I suspect 
> you have that, it's just not pasted in the table above.

> @AlanBateman Alternative C+ Arrays.copyOfRange() is implemented in 92ed32c. 
> So I think we finally got it? :-)

Yes, I think so, this is a lot simpler, so I think you've got it to a good 
place.

-------------

PR: https://git.openjdk.org/jdk/pull/10525

Reply via email to