Re: [VOTE] KIP-72 - Allow putting a bound on memory consumed by Incoming requests

Jay Kreps Fri, 11 Nov 2016 10:01:35 -0800

Hey Radai,

+1 on deprecating and eventually removing the old config. The intention was
absolutely bounding memory usage. I think having two ways of doing this,
one that gives a crisp bound on memory and one that is hard to reason about
is pretty confusing. I think people will really appreciate having one
config which instead lets them directly control the thing they actually
care about (memory).


I also want to second Jun's concern on the complexity of the self-GCing
memory pool. I wrote the memory pool for the producer. In that area the
pooling of messages is the single biggest factor in performance of the
client so I believed it was worth some sophistication/complexity if there
was performance payoff. All the same, the complexity of that code has made
it VERY hard to keep correct (it gets broken roughly every other time
someone makes a change). Over time I came to feel a lot less proud of my
cleverness. I learned something interesting reading your self-GCing memory
pool, but I wonder if the complexity is worth the payoff in this case?

Philosophically we've tried really hard to avoid needlessly "pluggable"
implementations. That is, when there is a temptation to give a config that
plugs in different Java classes at run time for implementation choices, we
should instead think of how to give the user the good behavior
automatically. I think the use case for configuring a the GCing pool would
be if you discovered a bug in which memory leaked. But this isn't something
the user should have to think about right? If there is a bug we should find
and fix it.

-Jay

On Fri, Nov 11, 2016 at 9:21 AM, radai <[email protected]> wrote:

> jun's #1 + rajini's #11 - the new config param is to enable changing the
> pool implentation class. as i said in my response to jun i will make the
> default pool impl be the simple one, and this param is to allow a user
> (more likely a dev) to change it.
> both the simple pool and the "gc pool" make basically just an
> AtomicLong.get() + (hashmap.put for gc) calls before returning a buffer.
> there is absolutely no dependency on GC times in allocating (or not). the
> extra background thread in the gc pool is forever asleep unless there are
> bugs (==leaks) so the extra cost is basically nothing (backed by
> benchmarks). let me re-itarate again - ANY BUFFER ALLOCATED MUST ALWAYS BE
> RELEASED - so the gc pool should not rely on gc for reclaiming buffers. its
> a bug detector, not a feature and is definitely not intended to hide bugs -
> the exact opposite - its meant to expose them sooner. i've cleaned up the
> docs to avoid this confusion. i also like the fail on leak. will do.
> as for the gap between pool size and heap size - thats a valid argument.
> may allow also sizing the pool as % of heap size? so queued.max.bytes =
> 1000000 for 1MB and queued.max.bytes = 0.25 for 25% of available heap?
>
> jun's 2.2 - queued.max.bytes + socket.request.max.bytes still holds,
> assuming the ssl-related buffers are small. the largest weakness in this
> claim has to do with decompression rather than anything ssl-related. so yes
> there is an O(#ssl connections * sslEngine packet size) component, but i
> think its small. again - decompression should be the concern.
>
> rajini's #13 - interesting optimization. the problem is there's no knowing
> in advance what the _next_ request to come out of a socket is, so this
> would mute just those sockets that are 1. mutable and 2. have a
> buffer-demanding request for which we could not allocate a buffer. downside
> is that as-is this would cause the busy-loop on poll() that the mutes were
> supposed to prevent - or code would need to be added to ad-hocmute a
> connection that was so-far unmuted but has now generated a memory-demanding
> request?
>
>
>
> On Fri, Nov 11, 2016 at 5:02 AM, Rajini Sivaram <
> [email protected]> wrote:
>
> > Radai,
> >
> > 11. The KIP talks about a new server configuration parameter
> > *memory.pool.class.name
> > <http://memory.pool.class.name> *which is not in the implementation. Is
> it
> > still the case that the pool will be configurable?
> >
> > 12. Personally I would prefer not to have a garbage collected pool that
> > hides bugs as well. Apart from the added code complexity and extra thread
> > to handle collections, I am also concerned about the non-deterministic
> > nature of GC timings. The KIP introduces delays in processing requests
> > based on the configuration parameter *queued.max.bytes. *This in
> unrelated
> > to the JVM heap size and hence pool can be full when there is no pressure
> > on the JVM to garbage collect. The KIP does not prevent other timeouts in
> > the broker (eg. consumer session timeout) because it is relying on the
> pool
> > to be managed in a deterministic, timely manner. Since a garbage
> collected
> > pool cannot provide that guarantee, wouldn't it be better to run tests
> with
> > a GC-pool that perhaps fails with a fatal error if it encounters a buffer
> > that was not released?
> >
> > 13. The implementation currently mutes all channels that don't have a
> > receive buffer allocated. Would it make sense to mute only the channels
> > that need a buffer (i.e. allow channels to read the 4-byte size that is
> not
> > read using the pool) so that normal client connection close() is handled
> > even when the pool is full? Since the extra 4-bytes may already be
> > allocated for some connections, the total request memory has to take into
> > account *4*numConnections* bytes anyway.
> >
> >
> > On Thu, Nov 10, 2016 at 11:51 PM, Jun Rao <[email protected]> wrote:
> >
> > > Hi, Radai,
> > >
> > > 1. Yes, I am concerned about the trickiness of having to deal with
> wreak
> > > refs. I think it's simpler to just have the simple version instrumented
> > > with enough debug/trace logging and do enough stress testing. Since we
> > > still have queued.max.requests, one can always fall back to that if a
> > > memory leak issue is identified. We could also label the feature as
> beta
> > if
> > > we don't think this is production ready.
> > >
> > > 2.2 I am just wondering after we fix that issue whether the claim that
> > the
> > > request memory is bounded by  queued.max.bytes +
> socket.request.max.bytes
> > > is still true.
> > >
> > > 5. Ok, leaving the default as -1 is fine then.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Nov 9, 2016 at 6:01 PM, radai <[email protected]>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thank you for taking the time to review this.
> > > >
> > > > 1. short version - yes, the concern is bugs, but the cost is tiny and
> > > worth
> > > > it, and its a common pattern. long version:
> > > >    1.1 detecting these types of bugs (leaks) cannot be easily done
> with
> > > > simple testing, but requires stress/stability tests that run for a
> long
> > > > time (long enough to hit OOM, depending on leak size and available
> > > memory).
> > > > this is why some sort of leak detector is "standard practice" .for
> > > example
> > > > look at netty (http://netty.io/wiki/reference-counted-objects.
> > > > html#leak-detection-levels)
> > > > <http://netty.io/wiki/reference-counted-objects.
> > > html#leak-detection-levels
> > > > >-
> > > > they have way more complicated built-in leak detection enabled by
> > > default.
> > > > as a concrete example - during development i did not properly dispose
> > of
> > > > in-progress KafkaChannel.receive when a connection was abruptly
> closed
> > > and
> > > > I only found it because of the log msg printed by the pool.
> > > >    1.2 I have a benchmark suite showing the performance cost of the
> gc
> > > pool
> > > > is absolutely negligible -
> > > > https://github.com/radai-rosenblatt/kafka-benchmarks/
> > > > tree/master/memorypool-benchmarks
> > > >    1.3 as for the complexity of the impl - its just ~150 lines and
> > pretty
> > > > straight forward. i think the main issue is that not many people are
> > > > familiar with weak refs and ref queues.
> > > >
> > > >    how about making the pool impl class a config param (generally
> good
> > > > going forward), make the default be the simple pool, and keep the GC
> > one
> > > as
> > > > a dev/debug/triage aid?
> > > >
> > > > 2. the KIP itself doesnt specifically treat SSL at all - its an
> > > > implementation detail. as for my current patch, it has some minimal
> > > > treatment of SSL - just enough to not mute SSL sockets mid-handshake
> -
> > > but
> > > > the code in SslTransportLayer still allocates buffers itself. it is
> my
> > > > understanding that netReadBuffer/appReadBuffer shouldn't grow beyond
> 2
> > x
> > > > sslEngine.getSession().getPacketBufferSize(), which i assume to be
> > > small.
> > > > they are also long lived (they live for the duration of the
> connection)
> > > > which makes a poor fit for pooling. the bigger fish to fry i think is
> > > > decompression - you could read a 1MB blob into a pool-provided buffer
> > and
> > > > then decompress it into 10MB of heap allocated on the spot :-) also,
> > the
> > > > ssl code is extremely tricky.
> > > >    2.2 just to make sure, youre talking about Selector.java: while
> > > > ((networkReceive = channel.read()) != null)
> > addToStagedReceives(channel,
> > > > networkReceive); ? if so youre right, and i'll fix that (probably by
> > > > something similar to immediatelyConnectedKeys, not sure yet)
> > > >
> > > > 3. isOutOfMemory is self explanatory (and i'll add javadocs and
> update
> > > the
> > > > wiki). isLowOnMem is basically the point where I start randomizing
> the
> > > > selection key handling order to avoid potential starvation. its
> rather
> > > > arbitrary and now that i think of it should probably not exist and be
> > > > entirely contained in Selector (where the shuffling takes place).
> will
> > > fix.
> > > >
> > > > 4. will do.
> > > >
> > > > 5. I prefer -1 or 0 as an explicit "OFF" (or basically anything <=0).
> > > > Long.MAX_VALUE would still create a pool, that would still waste time
> > > > tracking resources. I dont really mind though if you have a preferred
> > > magic
> > > > value for off.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 9, 2016 at 9:28 AM, Jun Rao <[email protected]> wrote:
> > > >
> > > > > Hi, Radai,
> > > > >
> > > > > Thanks for the KIP. Some comments below.
> > > > >
> > > > > 1. The KIP says "to facilitate faster implementation (as a safety
> > net)
> > > > the
> > > > > pool will be implemented in such a way that memory that was not
> > > > release()ed
> > > > > (but still garbage collected) would be detected and "reclaimed".
> this
> > > is
> > > > to
> > > > > prevent "leaks" in case of code paths that fail to release()
> > > properly.".
> > > > > What are the cases that could cause memory leaks? If we are
> concerned
> > > > about
> > > > > bugs, it seems that it's better to just do more testing to make
> sure
> > > the
> > > > > usage of the simple implementation (SimpleMemoryPool) is solid
> > instead
> > > of
> > > > > adding more complicated logic (GarbageCollectedMemoryPool) to hide
> > the
> > > > > potential bugs.
> > > > >
> > > > > 2. I am wondering how much this KIP covers the SSL channel
> > > > implementation.
> > > > > 2.1 SslTransportLayer maintains netReadBuffer, netWriteBuffer,
> > > > > appReadBuffer per socket. Should those memory be accounted for in
> > > memory
> > > > > pool?
> > > > > 2.2 One tricky thing with SSL is that during a KafkaChannel.read(),
> > > it's
> > > > > possible for multiple NetworkReceives to be returned since multiple
> > > > > requests' data could be encrypted together by SSL. To deal with
> this,
> > > we
> > > > > stash those NetworkReceives in Selector.stagedReceives and give it
> > back
> > > > to
> > > > > the poll() call one NetworkReceive at a time. What this means is
> > that,
> > > if
> > > > > we stop reading from KafkaChannel in the middle because memory pool
> > is
> > > > > full, this channel's key may never get selected for reads (even
> after
> > > the
> > > > > read interest is turned on), but there are still pending data for
> the
> > > > > channel, which will never get processed.
> > > > >
> > > > > 3. The code has the following two methods in MemoryPool, which are
> > not
> > > > > described in the KIP. Could you explain how they are used in the
> > wiki?
> > > > > isLowOnMemory()
> > > > > isOutOfMemory()
> > > > >
> > > > > 4. Could you also describe in the KIP at the high level, how the
> read
> > > > > interest bit for the socket is turned on/off with respect to
> > > MemoryPool?
> > > > >
> > > > > 5. Should queued.max.bytes defaults to -1 or Long.MAX_VALUE?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Nov 7, 2016 at 1:08 PM, radai <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to initiate a vote on KIP-72:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-72%3A+
> > > > > > Allow+putting+a+bound+on+memory+consumed+by+Incoming+requests
> > > > > >
> > > > > > The kip allows specifying a limit on the amount of memory
> allocated
> > > for
> > > > > > reading incoming requests into. This is useful for "sizing" a
> > broker
> > > > and
> > > > > > avoiding OOMEs under heavy load (as actually happens occasionally
> > at
> > > > > > linkedin).
> > > > > >
> > > > > > I believe I've addressed most (all?) concerns brought up during
> the
> > > > > > discussion.
> > > > > >
> > > > > > To the best of my understanding this vote is about the goal and
> > > > > > public-facing changes related to the new proposed behavior, but
> as
> > > for
> > > > > > implementation, i have the code up here:
> > > > > >
> > > > > > https://github.com/radai-rosenblatt/kafka/tree/broker-memory
> > > > > > -pool-with-muting
> > > > > >
> > > > > > and I've stress-tested it to work properly (meaning it chugs
> along
> > > and
> > > > > > throttles under loads that would DOS 10.0.1.0 code).
> > > > > >
> > > > > > I also believe that the primitives and "pattern"s introduced in
> > this
> > > > KIP
> > > > > > (namely the notion of a buffer pool and retrieving from /
> releasing
> > > to
> > > > > said
> > > > > > pool instead of allocating memory) are generally useful beyond
> the
> > > > scope
> > > > > of
> > > > > > this KIP for both performance issues (allocating lots of
> > short-lived
> > > > > large
> > > > > > buffers is a performance bottleneck) and other areas where memory
> > > > limits
> > > > > > are a problem (KIP-81)
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > Radai.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Rajini
> >
>

Re: [VOTE] KIP-72 - Allow putting a bound on memory consumed by Incoming requests

Reply via email to