Re: [VOTE] KIP-72 - Allow putting a bound on memory consumed by Incoming requests

radai Fri, 11 Nov 2016 09:31:09 -0800

jun's #1 + rajini's #11 - the new config param is to enable changing the
pool implentation class. as i said in my response to jun i will make the
default pool impl be the simple one, and this param is to allow a user
(more likely a dev) to change it.
both the simple pool and the "gc pool" make basically just an
AtomicLong.get() + (hashmap.put for gc) calls before returning a buffer.
there is absolutely no dependency on GC times in allocating (or not). the
extra background thread in the gc pool is forever asleep unless there are
bugs (==leaks) so the extra cost is basically nothing (backed by
benchmarks). let me re-itarate again - ANY BUFFER ALLOCATED MUST ALWAYS BE
RELEASED - so the gc pool should not rely on gc for reclaiming buffers. its
a bug detector, not a feature and is definitely not intended to hide bugs -
the exact opposite - its meant to expose them sooner. i've cleaned up the
docs to avoid this confusion. i also like the fail on leak. will do.
as for the gap between pool size and heap size - thats a valid argument.
may allow also sizing the pool as % of heap size? so queued.max.bytes =
1000000 for 1MB and queued.max.bytes = 0.25 for 25% of available heap?


jun's 2.2 - queued.max.bytes + socket.request.max.bytes still holds,
assuming the ssl-related buffers are small. the largest weakness in this
claim has to do with decompression rather than anything ssl-related. so yes
there is an O(#ssl connections * sslEngine packet size) component, but i
think its small. again - decompression should be the concern.

rajini's #13 - interesting optimization. the problem is there's no knowing
in advance what the _next_ request to come out of a socket is, so this
would mute just those sockets that are 1. mutable and 2. have a
buffer-demanding request for which we could not allocate a buffer. downside
is that as-is this would cause the busy-loop on poll() that the mutes were
supposed to prevent - or code would need to be added to ad-hocmute a
connection that was so-far unmuted but has now generated a memory-demanding
request?



On Fri, Nov 11, 2016 at 5:02 AM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> Radai,
>
> 11. The KIP talks about a new server configuration parameter
> *memory.pool.class.name
> <http://memory.pool.class.name> *which is not in the implementation. Is it
> still the case that the pool will be configurable?
>
> 12. Personally I would prefer not to have a garbage collected pool that
> hides bugs as well. Apart from the added code complexity and extra thread
> to handle collections, I am also concerned about the non-deterministic
> nature of GC timings. The KIP introduces delays in processing requests
> based on the configuration parameter *queued.max.bytes. *This in unrelated
> to the JVM heap size and hence pool can be full when there is no pressure
> on the JVM to garbage collect. The KIP does not prevent other timeouts in
> the broker (eg. consumer session timeout) because it is relying on the pool
> to be managed in a deterministic, timely manner. Since a garbage collected
> pool cannot provide that guarantee, wouldn't it be better to run tests with
> a GC-pool that perhaps fails with a fatal error if it encounters a buffer
> that was not released?
>
> 13. The implementation currently mutes all channels that don't have a
> receive buffer allocated. Would it make sense to mute only the channels
> that need a buffer (i.e. allow channels to read the 4-byte size that is not
> read using the pool) so that normal client connection close() is handled
> even when the pool is full? Since the extra 4-bytes may already be
> allocated for some connections, the total request memory has to take into
> account *4*numConnections* bytes anyway.
>
>
> On Thu, Nov 10, 2016 at 11:51 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Hi, Radai,
> >
> > 1. Yes, I am concerned about the trickiness of having to deal with wreak
> > refs. I think it's simpler to just have the simple version instrumented
> > with enough debug/trace logging and do enough stress testing. Since we
> > still have queued.max.requests, one can always fall back to that if a
> > memory leak issue is identified. We could also label the feature as beta
> if
> > we don't think this is production ready.
> >
> > 2.2 I am just wondering after we fix that issue whether the claim that
> the
> > request memory is bounded by  queued.max.bytes + socket.request.max.bytes
> > is still true.
> >
> > 5. Ok, leaving the default as -1 is fine then.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Nov 9, 2016 at 6:01 PM, radai <radai.rosenbl...@gmail.com>
> wrote:
> >
> > > Hi Jun,
> > >
> > > Thank you for taking the time to review this.
> > >
> > > 1. short version - yes, the concern is bugs, but the cost is tiny and
> > worth
> > > it, and its a common pattern. long version:
> > >    1.1 detecting these types of bugs (leaks) cannot be easily done with
> > > simple testing, but requires stress/stability tests that run for a long
> > > time (long enough to hit OOM, depending on leak size and available
> > memory).
> > > this is why some sort of leak detector is "standard practice" .for
> > example
> > > look at netty (http://netty.io/wiki/reference-counted-objects.
> > > html#leak-detection-levels)
> > > <http://netty.io/wiki/reference-counted-objects.
> > html#leak-detection-levels
> > > >-
> > > they have way more complicated built-in leak detection enabled by
> > default.
> > > as a concrete example - during development i did not properly dispose
> of
> > > in-progress KafkaChannel.receive when a connection was abruptly closed
> > and
> > > I only found it because of the log msg printed by the pool.
> > >    1.2 I have a benchmark suite showing the performance cost of the gc
> > pool
> > > is absolutely negligible -
> > > https://github.com/radai-rosenblatt/kafka-benchmarks/
> > > tree/master/memorypool-benchmarks
> > >    1.3 as for the complexity of the impl - its just ~150 lines and
> pretty
> > > straight forward. i think the main issue is that not many people are
> > > familiar with weak refs and ref queues.
> > >
> > >    how about making the pool impl class a config param (generally good
> > > going forward), make the default be the simple pool, and keep the GC
> one
> > as
> > > a dev/debug/triage aid?
> > >
> > > 2. the KIP itself doesnt specifically treat SSL at all - its an
> > > implementation detail. as for my current patch, it has some minimal
> > > treatment of SSL - just enough to not mute SSL sockets mid-handshake -
> > but
> > > the code in SslTransportLayer still allocates buffers itself. it is my
> > > understanding that netReadBuffer/appReadBuffer shouldn't grow beyond 2
> x
> > > sslEngine.getSession().getPacketBufferSize(), which i assume to be
> > small.
> > > they are also long lived (they live for the duration of the connection)
> > > which makes a poor fit for pooling. the bigger fish to fry i think is
> > > decompression - you could read a 1MB blob into a pool-provided buffer
> and
> > > then decompress it into 10MB of heap allocated on the spot :-) also,
> the
> > > ssl code is extremely tricky.
> > >    2.2 just to make sure, youre talking about Selector.java: while
> > > ((networkReceive = channel.read()) != null)
> addToStagedReceives(channel,
> > > networkReceive); ? if so youre right, and i'll fix that (probably by
> > > something similar to immediatelyConnectedKeys, not sure yet)
> > >
> > > 3. isOutOfMemory is self explanatory (and i'll add javadocs and update
> > the
> > > wiki). isLowOnMem is basically the point where I start randomizing the
> > > selection key handling order to avoid potential starvation. its rather
> > > arbitrary and now that i think of it should probably not exist and be
> > > entirely contained in Selector (where the shuffling takes place). will
> > fix.
> > >
> > > 4. will do.
> > >
> > > 5. I prefer -1 or 0 as an explicit "OFF" (or basically anything <=0).
> > > Long.MAX_VALUE would still create a pool, that would still waste time
> > > tracking resources. I dont really mind though if you have a preferred
> > magic
> > > value for off.
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Nov 9, 2016 at 9:28 AM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Radai,
> > > >
> > > > Thanks for the KIP. Some comments below.
> > > >
> > > > 1. The KIP says "to facilitate faster implementation (as a safety
> net)
> > > the
> > > > pool will be implemented in such a way that memory that was not
> > > release()ed
> > > > (but still garbage collected) would be detected and "reclaimed". this
> > is
> > > to
> > > > prevent "leaks" in case of code paths that fail to release()
> > properly.".
> > > > What are the cases that could cause memory leaks? If we are concerned
> > > about
> > > > bugs, it seems that it's better to just do more testing to make sure
> > the
> > > > usage of the simple implementation (SimpleMemoryPool) is solid
> instead
> > of
> > > > adding more complicated logic (GarbageCollectedMemoryPool) to hide
> the
> > > > potential bugs.
> > > >
> > > > 2. I am wondering how much this KIP covers the SSL channel
> > > implementation.
> > > > 2.1 SslTransportLayer maintains netReadBuffer, netWriteBuffer,
> > > > appReadBuffer per socket. Should those memory be accounted for in
> > memory
> > > > pool?
> > > > 2.2 One tricky thing with SSL is that during a KafkaChannel.read(),
> > it's
> > > > possible for multiple NetworkReceives to be returned since multiple
> > > > requests' data could be encrypted together by SSL. To deal with this,
> > we
> > > > stash those NetworkReceives in Selector.stagedReceives and give it
> back
> > > to
> > > > the poll() call one NetworkReceive at a time. What this means is
> that,
> > if
> > > > we stop reading from KafkaChannel in the middle because memory pool
> is
> > > > full, this channel's key may never get selected for reads (even after
> > the
> > > > read interest is turned on), but there are still pending data for the
> > > > channel, which will never get processed.
> > > >
> > > > 3. The code has the following two methods in MemoryPool, which are
> not
> > > > described in the KIP. Could you explain how they are used in the
> wiki?
> > > > isLowOnMemory()
> > > > isOutOfMemory()
> > > >
> > > > 4. Could you also describe in the KIP at the high level, how the read
> > > > interest bit for the socket is turned on/off with respect to
> > MemoryPool?
> > > >
> > > > 5. Should queued.max.bytes defaults to -1 or Long.MAX_VALUE?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Mon, Nov 7, 2016 at 1:08 PM, radai <radai.rosenbl...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to initiate a vote on KIP-72:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-72%3A+
> > > > > Allow+putting+a+bound+on+memory+consumed+by+Incoming+requests
> > > > >
> > > > > The kip allows specifying a limit on the amount of memory allocated
> > for
> > > > > reading incoming requests into. This is useful for "sizing" a
> broker
> > > and
> > > > > avoiding OOMEs under heavy load (as actually happens occasionally
> at
> > > > > linkedin).
> > > > >
> > > > > I believe I've addressed most (all?) concerns brought up during the
> > > > > discussion.
> > > > >
> > > > > To the best of my understanding this vote is about the goal and
> > > > > public-facing changes related to the new proposed behavior, but as
> > for
> > > > > implementation, i have the code up here:
> > > > >
> > > > > https://github.com/radai-rosenblatt/kafka/tree/broker-memory
> > > > > -pool-with-muting
> > > > >
> > > > > and I've stress-tested it to work properly (meaning it chugs along
> > and
> > > > > throttles under loads that would DOS 10.0.1.0 code).
> > > > >
> > > > > I also believe that the primitives and "pattern"s introduced in
> this
> > > KIP
> > > > > (namely the notion of a buffer pool and retrieving from / releasing
> > to
> > > > said
> > > > > pool instead of allocating memory) are generally useful beyond the
> > > scope
> > > > of
> > > > > this KIP for both performance issues (allocating lots of
> short-lived
> > > > large
> > > > > buffers is a performance bottleneck) and other areas where memory
> > > limits
> > > > > are a problem (KIP-81)
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Radai.
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards,
>
> Rajini
>

Re: [VOTE] KIP-72 - Allow putting a bound on memory consumed by Incoming requests

Reply via email to