On 7 December 2017 at 06:58, David Lloyd <david.ll...@redhat.com> wrote:
> On Wed, Dec 6, 2017 at 6:31 AM, Chris Hegarty <chris.hega...@oracle.com> > wrote: [snip] > > The primary motivation for the use byte buffers, as described above, is > > to provide maximum flexibility to an implementation to avoid copying > > and buffering of data. > > Is my reading of the API correct in that flow control is happening in > terms of buffers, not of bytes? Could there ever be any odd effects > from very small or very large buffers passing through the plumbing? > Your reading is correct. In my experience, it varies wildly by use case. In the technology I work with (Akka), we do exactly this, we have ByteStrings (essentially immutable byte buffer), and flow control is done on the number of ByteStrings, not the number of bytes in those strings. Generally, in reading, the size of ByteStrings is limited by a configurable amount, for example, 8kb. And then Akka's flow control will, by default, keep up to 8 ByteStrings in flight in its asynchronous processing pipeline. So we have a maximum buffer size of 64kb per connection. For most HTTP use cases, this is fine, something reading an HTTP message body might be collecting those buffers up to a maximum size of 100kb by default, and then parsing the buffer (eg, as json). So it's within the tolerances of what the amount of memory that the user expects to use per request. If the data read in to the buffers were very small, this would be due to the client trickle feeding the server - care must be taken on the server to ensure that if 8kb buffers are allocated for reads, but only a small amount of data is read, that these large buffers are released, and the small data copied to a small buffer. I think where it can possible cause a problem is if for some reason something sending data is only generating small byte buffer chunks, but there's a long (and expensive) pipeline for the chunks to go through before they get written out. This is not a use case that we see that often, but I have seen it. The solution there is to either increase the number of elements in flight in the stream (most reactive streams implementations allow this to be done trivially), or to put an aggregating buffer in the middle before the expensive processing (again, streaming implementations such as RxJava, Reactor or Akka streams provide straight forward stages to do this). One issue that I'm not sure about is the consequences of using direct buffers with regards to garbage collection. If direct buffers are never copied onto the heap, and are never reused, lets say you're just implementing a proxy passing buffers through from one connection to another, then the heap usage of the application may be very small, and this could mean that garbage collection is done very infrequently. As I understand it, this can result in direct buffers staying around for a long time, and possibly causing the system to run out of memory. Does anyone have any experience with that, and how to deal with it? We don't generally have this problem in Akka because we always copy our buffers onto the heap into an immutable structure, so even if we do use direct buffers and don't reuse them, our heap usage grows at least as fast as our direct buffer usage grows, which means total memory usage won't exceed twice the size of the heap since eventually garbage collection will clean both up. > > -- > - DML > -- *James Roper* *Senior Octonaut* Lightbend <https://www.lightbend.com/> – Build reactive apps! Twitter: @jroper <https://twitter.com/jroper>