Re: RFR 8184285: Buffer sizes of Flow based BodyProcessor API

Chris Hegarty Mon, 21 Aug 2017 03:47:30 -0700

Tobias,

Thank you for trying this out and giving such a detailed reply. Commentsinline ...

On 18 Aug 2017, at 18:45, Tobias Thierer <[email protected]> wrote:

Hi Michael & Chris -
apologies for the slow follow-up. I couldn't get my IDE (IntelliJ) toaccept a locally built OpenJDK 9 into which I had patched yourproposed changes ("The selected directory is not a valid home for theJDK"). Not your problem to solve, but it led me to procrastinate alittle because I had to run command lines to compile & see failures.Also, note that I had only patched initial your change rather thanChris's later revision, but from quick inspection it looks like thatdoesn't affect the comments below.
I've run into one limitation with the new List<ByteBuffer> basedapproach that I wasn't aware of when I wrote my initial reaction. It'snot necessarily a deal-breaker (BufferingProcessor is still useful),but I wanted to mention it. I also have an idea that would allow us togo back to ByteBuffer (rather than List<ByteBuffer>) being the unitof data that's passed through the subscription, without losing anyflexibility/capability of the API.
===== Limitation of the new API

The two goals that I expected your change to achieve is:
• Give an application control over the size of the data chunks that ithas to process at a time, and• Give an application control (lower/upper bound) on how many bytes,as opposed to how many ByteBuffers, are being held in memory.I only realized today that your change actually only achieves thefirst goal, but not the second. I also had an idea how the first goalcould be achieved without changing the unit of data from ByteBuffer toList<ByteBuffer> (see more below).

I do not agree with this, or your goals. You seem to be equatingByteBuffers with data chunks, and that is not always the case. Whenreading HTTP/2 frames it is not practical, or even possible, to try tohave a one-to-one association of frames to ByteBuffers. SizingByteBuffers to a given size does not translate well to a fixed size unitof body data delivery. At least not without copying, which we want toavoid baking into the API.

The buffering processor has just one goal: ensure that the downstreamprocessor is invoked with a predetermined / fixed number of bytes eachtime its onNext method is called.

From the HTTP Client’s point of view, there should be no excessiveByteBuffers being held in memory. From the perspective of Flow, once anitem ( a list of ByteBuffers ) is passed to onNext, control of said itemis also passed. If the processor stuffs the buffers into some containerfor later use, then yes they may be held there for some period of time,but that is up to the processor itself. Clearly buffering processor doesdo this, but then again that’s the point of it.

In the case of HTTP/2, its own flow control ensures an upper bound onthe amount of data that can be sent, hence potentially buffered. TheHTTP/1.1 implementation should ensure that "reasonably" sizedByteBuffers are used. I don't think that a tuning nob is required tobubble up to the API level for this. How would a Java developer know howbest to size the internal ByteBuffers used by the client implementationwhen making a request to a HTTP/2 server that may send multiple pushpromises ( i.e. have may streams )? This is not something that we wantJava developers to have to think about. The implementation should havesome reasonable defaults.

The issue with the second goal is that while the newBodyProcessor.buffering() API gives the application control over thesize of the ByteBuffers delivered to it, it doesn't give it controlover the number of bytes buffered, because it doesn't know how long aList<ByteBuffer> will be delivered to it on each call to onNext().

The new BodyProcessor.buffering() does give control over the number ofbytes. It guarantees that the List<ByteBuffer> will contain N number ofbytes each time the downstream onNext is invoked. The tests assert this.Internally, the buffering processor cannot grow more than the size ofone ByteBuffer, whose size is determined by the default clientimplementation.

Note: List<ByteBuffer> is used as the Flow since there is no compositeByteBuffer available ( or constructible ) as things stand in the JavaPlatform. In many cases the BB’s are heap BB’s so relatively smallwrappers around byte[]’s that offer positional constraints. Using acombination of both allows an implementation to minimise, or eliminate,the need to copy the data as it flows through.

Note that both before and after your proposed change, an applicationcan achieve the state where a request for more data is issuedimmediately when the number of bytes buffered drops belowa lower bound; it just can't stop the system from giving it more datathan it'd like (upper bound). This is true both before and after yourchange.

Not true. The new buffering processor will put an upper bound on theNUMBER OF BYTES being passed to the downstream processor. As for backpressure, that can be achieved through the Flow API, by the processornot requesting more data. I do accept that the buffering processorcannot limit the max amount of data received, but I do not see that as aproblem given my comments about reasonable defaults above.

[ Note: the internal implementation of some of the convenience requestprocessors eagerly read ALL data and put it into a queue ( no flowcontrol ). There are changes coming to address this. ]

For example, I've adjusted my example of a PipedResponseStream to anew API. The way I've implemented the lower bound is:• Previously, it started withsubscription.request(initialBuffersToRequest) and followed that upwith subscription.request(1) each time a buffer was cleared from thequeue.• I now changed it to keep track of whether there is currently arequest outstanding (the time between subscription.request(1) and thecorresponding onNext()). Everytime there is (a) no requestoutstanding, and (b) buffers.size < numByteBuffersToBuffer, a newsubscription.request(1) is made. This can happen at three differenttimes: 1.) during onSubscribe(), 2.) when on request completes inonNext() but we discover that we still have too little data, and 3.)during take() when a ByteBuffer is taken out of the internal buffer.Of course, if the latency between onRequest(1) and onNext() is toohigh, it could be that the internal buffer runs out and we don't keepup refilling it.I can ask Martin to upload the latest code to his workspace if youlike, but I suspect you get the idea.

Your description is sufficient, no need to upload.

===== How to change back to the old API without losing the new capability
I think it's not actually necessary to change from ByteBuffer toList<ByteBuffer> in order to achieve goal (1.), i.e. the fixedByteBuffer size delivered by BufferingProcessor.

Given my comment above, with HTTP/2 the client cannot guarantee that asingle channel read will result in a ByteBuffer that contains only datafrom a single frame. If you accept this, then maybe we can you can stopreading here, otherwise I'll try to reply to your comments / suggestions.

Here's how:
• BufferingProcessor keeps a Queue<ByteBuffer> internally, instead ofpassing the whole List to the delegate BodyProcessor.• When the delegate calls request(n), BufferingProcessor callsonNext(ByteBuffer) n times, supplying ByteBuffers from its internal Queue.

So the downstream processor may get less than N number of bytes in eachonNext call. I'm not sure how that helps. But maybe you are notconcerned about this anymore.

• When the internal queue size gets small, BufferingProcessor callsrequest(1) (or request() with some value > 1) on its subscription toget more data to feed into its Queue<ByteBuffer>.
Notes:
• Obviously, BufferingProcessor will run the risk of running out ofdata buffered internally if the delegate processor is requesting thedata too quickly. And obviously, the BufferingProcessor has a hardtime deciding on the correct number n to pass to request(n) on itssubscription. But, because BufferingProcessor is part of the HTTPClient implementation, it is in a much better position thanthe application to know about internal buffer sizes, have someheuristic to determine n based on that ration between that internalbuffer size and the chunk size requested by the delegateBodyProcessor, and potentially (in a sophistication improvement)measure throughput / latency to make a guess as to how much more datait should request how early on.• Also obviously, the application would have no control over how muchextra data will be used up by the BufferingProcessor. But that's nottoo different from how the application currently has no controlover how many extra ByteBuffers will be delivered by the system duringonNext(List<ByteBuffer>).

Sure, but without a composite ByteBuffer, if we want to guaranteedonNext is delivered N bytes, we cannot avoid a sequence-of-ByteBuffers (without copying ).

And again, BufferingProessor is in a better position to deal with thisthan the application because it can hard code more knowledge about therest of the implementation.

Well, it should just use the processor API, but I see that you aresuggesting that it could have a closer relationship with the clientimplementation.

• By going back from List<ByteBuffer> to ByteBuffer, everyone whoimplements BodyProcessor is saved the onerous task of doing the "for(BodyProcessor item : items) { ... }" loop in onNext().

I do not accept that dealing with List<ByteBuffer> is onerous. Javadevelopers are already familiar with the scatter / gather pattern. Thisis not introducing a new concept. The code to deal with ByteBufferalready needs to handle the case where the data is incomplete, all thatis changing here is a straightforward foreach loop.

Thoughts?

Speaking generally, your concern now seems to have moved to how muchmemory will be consumed by the client, maybe per request. This differentfrom what was said earlier, but I suggest that in general theimplementation should not use/buffer any more data than can fit in asingle ByteBuffer per HTTP1.1 request/reply or HTTP/2 stream. And thatthe client choose a reasonable default size ( maybe configuration by asystem property or something ( not at the API level ) ).


-Chris.

Tobias
On Thu, Aug 3, 2017 at 6:02 PM, MichaelMcMahon <[email protected]> wrote:
Hi,
The HTTP client work is continuing in a new branch of the JDK 10sandbox forest (http-client-branch),
and here is the first of a number of changes we want to make.
This one is to address the feedback we received whereHttpResponse.BodyProcessors wouldbe easier to implement if there was control over the size of buffersbeing supplied.
To that end we have added APIs for creating buffered responseprocessors (and handlers)
So, HttpResponse.BodyProcessor has a new static method with thefollowing signature
public static <T> BodyProcessor<T> buffering(BodyProcessor<T>downstream, long buffersize) {}
This returns a new processor which delivers data to the supplieddownstream processor, bufferedby the 'buffersize' parameter. It guarantees that all data isdelivered in chunks of that size
until the final chunk, which may be smaller.
This should allow other BodyProcessor implementations that requirebuffering to wrap themselvesin this way, be guaranteed that the data they receive is buffered, andthen return that composite
processor to their user.

A similar method is added to HttpResponse.BodyHandler.
Note also, that we have changed HttpResponse.BodyProcessor from beinga Flow.Subscriber<ByteBuffer>to Flow.Subscriber(List<ByteBuffer>). That change is technicallyorthogonal to this one, but is motivatedby it. By transferring ByteBuffers in lists makes it easier to bufferthem efficiently.
The webrev is at: http://cr.openjdk.java.net/~michaelm/8184285/webrev.1/

Thanks,
Michael

Re: RFR 8184285: Buffer sizes of Flow based BodyProcessor API

Reply via email to