James,

On 07/12/17 00:19, James Roper wrote:
> ...
> Your reading is correct. In my experience, it varies wildly by use case.
> In the technology I work with (Akka), we do exactly this, we have
> ByteStrings (essentially immutable byte buffer), and flow control is
> done on the number of ByteStrings, not the number of bytes in those
> strings. Generally, in reading, the size of ByteStrings is limited by a
> configurable amount, for example, 8kb. And then Akka's flow control
> will, by default, keep up to 8 ByteStrings in flight in its asynchronous
> processing pipeline. So we have a maximum buffer size of 64kb per
> connection. For most HTTP use cases, this is fine, something reading an
> HTTP message body might be collecting those buffers up to a maximum size
> of 100kb by default, and then parsing the buffer (eg, as json). So it's
> within the tolerances of what the amount of memory that the user expects
> to use per request. If the data read in to the buffers were very small,
> this would be due to the client trickle feeding the server - care must
> be taken on the server to ensure that if 8kb buffers are allocated for
> reads, but only a small amount of data is read, that these large buffers
> are released, and the small data copied to a small buffer.
>
> I think where it can possible cause a problem is if for some reason
> something sending data is only generating small byte buffer chunks, but
> there's a long (and expensive) pipeline for the chunks to go through
> before they get written out. This is not a use case that we see that
> often, but I have seen it. The solution there is to either increase the
> number of elements in flight in the stream (most reactive streams
> implementations allow this to be done trivially), or to put an
> aggregating buffer in the middle before the expensive processing (again,
> streaming implementations such as RxJava, Reactor or Akka streams
> provide straight forward stages to do this).

Part of the feedback [1] we received to date has resulted in the
addition of a buffering subscriber [2], that buffers data before
delivering it to a downstream subscriber. The buffering subscriber
guarantees to deliver a given number of bytes of data to each invocation
of the downstream's `onNext`, except for the final invocation, just
before `onComplete` is invoked. This lead us to the realization that an
aggregate of byte buffers is more flexible, it allows for, but does not
require, accumulation where it makes sense.

We encountered similar issues with trickling small amounts of data, some
of which are described in 8186750 [3]. We settled on a a similar
solution, accumulate below a certain threshold. This can be important in
cases where the data is framed, like in HTTP/2.

> One issue that I'm not sure about is the consequences of using direct
> buffers with regards to garbage collection. If direct buffers are never
> copied onto the heap, and are never reused, lets say you're just
> implementing a proxy passing buffers through from one connection to
> another, then the heap usage of the application may be very small, and
> this could mean that garbage collection is done very infrequently. As I
> understand it, this can result in direct buffers staying around for a
> long time, and possibly causing the system to run out of memory. Does
> anyone have any experience with that, and how to deal with it? We don't
> generally have this problem in Akka because we always copy our buffers
> onto the heap into an immutable structure, so even if we do use direct
> buffers and don't reuse them, our heap usage grows at least as fast as
> our direct buffer usage grows, which means total memory usage won't
> exceed twice the size of the heap since eventually garbage collection
> will clean both up.

We found that performance / throughput is far more dependent on factors
other than direct buffers, for example in HTTP/2 the connection / stream
window, and frame sizes have a significant impact. Given the prevalence
of HTTPS, the data on and off the wire requires encryption and
decryption, much of which does not appear to benefit from being in
direct buffers. Additionally, buffer management becomes tricky when a
single buffer can contain multiple HTTP/2 frames, slicing, tracking,
etc, is required to avoiding copying, if the intent is to support reuse
of the buffer. Much of our experiments with direct buffers did not yield
any obvious benefit, and as such the implementation we have today only
uses heap buffers. That said, the API does not preclude the use of
direct buffers, or the addition of more advanced buffer management in
the future.

Alan has already covered some of the recent improvements to freeing
direct buffers.

-Chris.

[1] https://bugs.openjdk.java.net/browse/JDK-8184285
[2] http://cr.openjdk.java.net/~chegar/httpclient/javadoc/api/jdk/incubator/http/HttpResponse.BodySubscriber.html#buffering(jdk.incubator.http.HttpResponse.BodySubscriber,int)
[3] https://bugs.openjdk.java.net/browse/JDK-8186750

Reply via email to