Hi Jun, 3. will (also :-) ) do. do you have ideas for appropriate names/metrics? I'm thinking along the lines of "MemoryAvailable" (current snapshot value from pool) and "Throttles" (some moving-avg of how often does throttling due to no mem kicks in). maybe also "BuffersOutstanding" ?
On Thu, Nov 17, 2016 at 7:01 PM, Jun Rao <j...@confluent.io> wrote: > Hi, Radai, > > 2. Yes, on the server side, the timeout is hardcoded at 300ms. That's not > too bad. We can just leave it as it is. > > 3. Another thing. Do we plan to expose some JMX metrics so that we can > monitor if there is any memory pressure in the pool? > > Thanks, > > Jun > > On Thu, Nov 17, 2016 at 8:57 AM, radai <radai.rosenbl...@gmail.com> wrote: > > > Hi Jun, > > > > 1. will do. > > > > 2. true. for several reasons: > > 2.1. which selector? there's a single pool but 16 selectors (linkedin > > typical, num.network.threads defaults to 3) > > 2.2. even if i could figure out which selector (all?) the better thing > > to do would be resume reading not when any memory becomes available > > (because worst case its not enough for anything) but when some "low > > watermark" of available memory is hit - so mute when @100% mem, unmute > when > > back down to 90%? > > 2.3. on the broker side (which is the current concern for my patch) > this > > max wait time is a hardcoded 300 ms (SocketServer.Processor.poll()), > which > > i think is acceptable and definitely not arbitrary or configurable. > > > > if you still think this needs to be addressed (and you are right that > in > > the general case the timeout param could be arbitrary) i can implement > the > > watermark approach + pool.waitForLowWatermark(timeout) or something, and > > make Selector.poll() wait for low watermark at the end of poll() if no > work > > has been done (so as not to wait on memory needlessly for requests that > > done require it, as rajini suggested earlier) > > > > On Wed, Nov 16, 2016 at 9:04 AM, Jun Rao <j...@confluent.io> wrote: > > > > > Hi, Radai, > > > > > > Thanks for the updated proposal. +1 overall. A couple of comments > below. > > > > > > 1. Our current convention is to avoid using getters. Could you change > > > getSize and getAvailableMemory accordingly? Also, size is bit > ambiguous, > > > could we use sth like capacity? > > > > > > 2. This is more on the implementation details. I didn't see any code to > > > wake up the selector when memory is released from the pool. For > example, > > > suppose that all socket keys are muted since the pool is full. The > > > selector.poll() call will wait for the timeout, which could be > > arbitrarily > > > long. Now, if some memory is released, it seems that we should wake up > > the > > > selector early instead of waiting for the timeout. > > > > > > Jun > > > > > > > > > On Mon, Nov 14, 2016 at 11:41 AM, Rajini Sivaram < > > > rajinisiva...@googlemail.com> wrote: > > > > > > > +1 > > > > > > > > Thank you for the KIP, Radai. > > > > > > > > On Mon, Nov 14, 2016 at 6:07 PM, Mickael Maison < > > > mickael.mai...@gmail.com> > > > > wrote: > > > > > > > > > +1. We've also been hit by OOMs on the broker because we were not > > able > > > > > to properly bound its memory usage. > > > > > > > > > > On Mon, Nov 14, 2016 at 5:56 PM, radai <radai.rosenbl...@gmail.com > > > > > > wrote: > > > > > > @rajini - fixed the hasBytesBuffered() method. also updated > poll() > > so > > > > > that > > > > > > no latency is added for picking up data stuck in ssl buffers > > (timeout > > > > is > > > > > > set to 0, just like with immediately connected keys and staged > > > > receives). > > > > > > thank you for pointing these out. > > > > > > added ssl (re) testing to the KIP testing plan. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 14, 2016 at 7:24 AM, Rajini Sivaram < > > > > > > rajinisiva...@googlemail.com> wrote: > > > > > > > > > > > >> Open point 1. I would just retain the current long value that > > > > specifies > > > > > >> queued.max.bytes as long and not as %heap since it is simple and > > > easy > > > > to > > > > > >> use. And keeps it consistent with other ".bytes" configs. > > > > > >> > > > > > >> Point 3. ssl buffers - I am not quite sure the implementation > > looks > > > > > >> correct. hasBytesBuffered() is checking position() of buffers == > > 0. > > > > And > > > > > the > > > > > >> code checks this only when poll with a timeout returns (adding a > > > delay > > > > > when > > > > > >> there is nothing else to read). > > > > > >> But since this and open point 2 (optimization) are > implementation > > > > > details, > > > > > >> they can be looked at during PR review. > > > > > >> > > > > > >> It will be good to add SSL testing to the test plan as well, > since > > > > > there is > > > > > >> additional code to test for SSL. > > > > > >> > > > > > >> > > > > > >> On Fri, Nov 11, 2016 at 9:03 PM, radai < > > radai.rosenbl...@gmail.com> > > > > > wrote: > > > > > >> > > > > > >> > ok, i've made the following changes: > > > > > >> > > > > > > >> > 1. memory.pool.class.name has been removed > > > > > >> > 2. the code now only uses SimpleMemoryPool. the gc variant is > > left > > > > > >> (unused) > > > > > >> > as a developement aid and is unsettable via configuration. > > > > > >> > 3. I've resolved the issue of stale data getting stuck in > > > > intermediate > > > > > >> > (ssl) buffers. > > > > > >> > 4. default value for queued.max.bytes is -1, so off by > default. > > > any > > > > > <=0 > > > > > >> > value is interpreted as off by the underlying code. > > > > > >> > > > > > > >> > open points: > > > > > >> > > > > > > >> > 1. the kafka config framework doesnt allow a value to be > either > > > long > > > > > or > > > > > >> > double, so in order to pull off the queued.max.bytes = 1000000 > > or > > > > > >> > queued.max.bytes = 0.3 thing i'd need to define the config as > > type > > > > > >> string, > > > > > >> > which is ugly to me. do we want to support setting > > > queued.max.bytes > > > > > to % > > > > > >> of > > > > > >> > heap ? if so, by way of making queued.max.bytes of type > string, > > or > > > > by > > > > > way > > > > > >> > of a 2nd config param (with the resulting > > either/all/combination? > > > > > >> > validation). my personal opinion is string because i think a > > > single > > > > > >> > queued.max.bytes with overloaded meaning is more > understandable > > to > > > > > users. > > > > > >> > i'll await other people's opinions before doing anything. > > > > > >> > 2. i still need to evaluate rajini's optimization. sounds > > doable. > > > > > >> > > > > > > >> > asides: > > > > > >> > > > > > > >> > 1. i think you guys misunderstood the intent behind the gc > pool. > > > it > > > > > was > > > > > >> > never meant to be a magic pool that automatically releases > > buffers > > > > > >> (because > > > > > >> > just as rajini stated the performance implications would be > > > > > horrible). it > > > > > >> > was meant to catch leaks early. since that is indeed a > dev-only > > > > > concern > > > > > >> it > > > > > >> > wont ever get used in production. > > > > > >> > 2. i said this on some other kip discussion: i think the nice > > > thing > > > > > about > > > > > >> > the pool API is it "scales" from just keeping a memory bound > to > > > > > actually > > > > > >> > re-using buffers without changing the calling code. i think > > > > > >> actuallypooling > > > > > >> > large buffers will result in a significant performance impact, > > but > > > > > thats > > > > > >> > outside the scope of this kip. at that point i think more pool > > > > > >> > implementations (that actually pool) would be written. i agree > > > with > > > > > the > > > > > >> > ideal of exposing as few knobs as possible, but switching > pools > > > (or > > > > > pool > > > > > >> > params) for tuning may happen at some later point. > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > On Fri, Nov 11, 2016 at 11:44 AM, Rajini Sivaram < > > > > > >> > rajinisiva...@googlemail.com> wrote: > > > > > >> > > > > > > >> > > 13. At the moment, I think channels are not muted if: > > > > > >> > > channel.receive != null && channel.receive.buffer != > null > > > > > >> > > This mutes all channels that aren't holding onto a > incomplete > > > > > buffer. > > > > > >> > They > > > > > >> > > may or may not have read the 4-byte size. > > > > > >> > > > > > > > >> > > I was thinking you could avoid muting channels if: > > > > > >> > > channel.receive == null || channel.receive.size. > > remaining() > > > > > >> > > This will not mute channels that are holding onto a buffer > (as > > > > > above). > > > > > >> In > > > > > >> > > addition, it will not mute channels that haven't read the > > 4-byte > > > > > size. > > > > > >> A > > > > > >> > > client that is closed gracefully while the pool is full will > > not > > > > be > > > > > >> muted > > > > > >> > > in this case and the server can process close without > waiting > > > for > > > > > the > > > > > >> > pool > > > > > >> > > to free up. Once the 4-byte size is read, the channel will > be > > > > muted > > > > > if > > > > > >> > the > > > > > >> > > pool is still out of memory - for each channel, at most one > > > failed > > > > > read > > > > > >> > > attempt would be made while the pool is out of memory. I > think > > > > this > > > > > >> would > > > > > >> > > also delay muting of SSL channels since they can continue to > > > read > > > > > into > > > > > >> > > their (already allocated) network buffers and unwrap the > data > > > and > > > > > block > > > > > >> > > only when they need to allocate a buffer from the pool. > > > > > >> > > > > > > > >> > > On Fri, Nov 11, 2016 at 6:00 PM, Jay Kreps < > j...@confluent.io> > > > > > wrote: > > > > > >> > > > > > > > >> > > > Hey Radai, > > > > > >> > > > > > > > > >> > > > +1 on deprecating and eventually removing the old config. > > The > > > > > >> intention > > > > > >> > > was > > > > > >> > > > absolutely bounding memory usage. I think having two ways > of > > > > doing > > > > > >> > this, > > > > > >> > > > one that gives a crisp bound on memory and one that is > hard > > to > > > > > reason > > > > > >> > > about > > > > > >> > > > is pretty confusing. I think people will really appreciate > > > > having > > > > > one > > > > > >> > > > config which instead lets them directly control the thing > > they > > > > > >> actually > > > > > >> > > > care about (memory). > > > > > >> > > > > > > > > >> > > > I also want to second Jun's concern on the complexity of > the > > > > > >> self-GCing > > > > > >> > > > memory pool. I wrote the memory pool for the producer. In > > that > > > > > area > > > > > >> the > > > > > >> > > > pooling of messages is the single biggest factor in > > > performance > > > > of > > > > > >> the > > > > > >> > > > client so I believed it was worth some > > > sophistication/complexity > > > > > if > > > > > >> > there > > > > > >> > > > was performance payoff. All the same, the complexity of > that > > > > code > > > > > has > > > > > >> > > made > > > > > >> > > > it VERY hard to keep correct (it gets broken roughly every > > > other > > > > > time > > > > > >> > > > someone makes a change). Over time I came to feel a lot > less > > > > > proud of > > > > > >> > my > > > > > >> > > > cleverness. I learned something interesting reading your > > > > > self-GCing > > > > > >> > > memory > > > > > >> > > > pool, but I wonder if the complexity is worth the payoff > in > > > this > > > > > >> case? > > > > > >> > > > > > > > > >> > > > Philosophically we've tried really hard to avoid > needlessly > > > > > >> "pluggable" > > > > > >> > > > implementations. That is, when there is a temptation to > > give a > > > > > config > > > > > >> > > that > > > > > >> > > > plugs in different Java classes at run time for > > implementation > > > > > >> choices, > > > > > >> > > we > > > > > >> > > > should instead think of how to give the user the good > > behavior > > > > > >> > > > automatically. I think the use case for configuring a the > > > GCing > > > > > pool > > > > > >> > > would > > > > > >> > > > be if you discovered a bug in which memory leaked. But > this > > > > isn't > > > > > >> > > something > > > > > >> > > > the user should have to think about right? If there is a > bug > > > we > > > > > >> should > > > > > >> > > find > > > > > >> > > > and fix it. > > > > > >> > > > > > > > > >> > > > -Jay > > > > > >> > > > > > > > > >> > > > On Fri, Nov 11, 2016 at 9:21 AM, radai < > > > > > radai.rosenbl...@gmail.com> > > > > > >> > > wrote: > > > > > >> > > > > > > > > >> > > > > jun's #1 + rajini's #11 - the new config param is to > > enable > > > > > >> changing > > > > > >> > > the > > > > > >> > > > > pool implentation class. as i said in my response to > jun i > > > > will > > > > > >> make > > > > > >> > > the > > > > > >> > > > > default pool impl be the simple one, and this param is > to > > > > allow > > > > > a > > > > > >> > user > > > > > >> > > > > (more likely a dev) to change it. > > > > > >> > > > > both the simple pool and the "gc pool" make basically > just > > > an > > > > > >> > > > > AtomicLong.get() + (hashmap.put for gc) calls before > > > > returning a > > > > > >> > > buffer. > > > > > >> > > > > there is absolutely no dependency on GC times in > > allocating > > > > (or > > > > > >> not). > > > > > >> > > the > > > > > >> > > > > extra background thread in the gc pool is forever asleep > > > > unless > > > > > >> there > > > > > >> > > are > > > > > >> > > > > bugs (==leaks) so the extra cost is basically nothing > > > (backed > > > > by > > > > > >> > > > > benchmarks). let me re-itarate again - ANY BUFFER > > ALLOCATED > > > > MUST > > > > > >> > ALWAYS > > > > > >> > > > BE > > > > > >> > > > > RELEASED - so the gc pool should not rely on gc for > > > reclaiming > > > > > >> > buffers. > > > > > >> > > > its > > > > > >> > > > > a bug detector, not a feature and is definitely not > > intended > > > > to > > > > > >> hide > > > > > >> > > > bugs - > > > > > >> > > > > the exact opposite - its meant to expose them sooner. > i've > > > > > cleaned > > > > > >> up > > > > > >> > > the > > > > > >> > > > > docs to avoid this confusion. i also like the fail on > > leak. > > > > will > > > > > >> do. > > > > > >> > > > > as for the gap between pool size and heap size - thats a > > > valid > > > > > >> > > argument. > > > > > >> > > > > may allow also sizing the pool as % of heap size? so > > > > > >> > queued.max.bytes = > > > > > >> > > > > 1000000 for 1MB and queued.max.bytes = 0.25 for 25% of > > > > available > > > > > >> > heap? > > > > > >> > > > > > > > > > >> > > > > jun's 2.2 - queued.max.bytes + socket.request.max.bytes > > > still > > > > > >> holds, > > > > > >> > > > > assuming the ssl-related buffers are small. the largest > > > > > weakness in > > > > > >> > > this > > > > > >> > > > > claim has to do with decompression rather than anything > > > > > >> ssl-related. > > > > > >> > so > > > > > >> > > > yes > > > > > >> > > > > there is an O(#ssl connections * sslEngine packet size) > > > > > component, > > > > > >> > but > > > > > >> > > i > > > > > >> > > > > think its small. again - decompression should be the > > > concern. > > > > > >> > > > > > > > > > >> > > > > rajini's #13 - interesting optimization. the problem is > > > > there's > > > > > no > > > > > >> > > > knowing > > > > > >> > > > > in advance what the _next_ request to come out of a > socket > > > is, > > > > > so > > > > > >> > this > > > > > >> > > > > would mute just those sockets that are 1. mutable and 2. > > > have > > > > a > > > > > >> > > > > buffer-demanding request for which we could not > allocate a > > > > > buffer. > > > > > >> > > > downside > > > > > >> > > > > is that as-is this would cause the busy-loop on poll() > > that > > > > the > > > > > >> mutes > > > > > >> > > > were > > > > > >> > > > > supposed to prevent - or code would need to be added to > > > > > ad-hocmute > > > > > >> a > > > > > >> > > > > connection that was so-far unmuted but has now > generated a > > > > > >> > > > memory-demanding > > > > > >> > > > > request? > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > On Fri, Nov 11, 2016 at 5:02 AM, Rajini Sivaram < > > > > > >> > > > > rajinisiva...@googlemail.com> wrote: > > > > > >> > > > > > > > > > >> > > > > > Radai, > > > > > >> > > > > > > > > > > >> > > > > > 11. The KIP talks about a new server configuration > > > parameter > > > > > >> > > > > > *memory.pool.class.name > > > > > >> > > > > > <http://memory.pool.class.name> *which is not in the > > > > > >> > implementation. > > > > > >> > > > Is > > > > > >> > > > > it > > > > > >> > > > > > still the case that the pool will be configurable? > > > > > >> > > > > > > > > > > >> > > > > > 12. Personally I would prefer not to have a garbage > > > > collected > > > > > >> pool > > > > > >> > > that > > > > > >> > > > > > hides bugs as well. Apart from the added code > complexity > > > and > > > > > >> extra > > > > > >> > > > thread > > > > > >> > > > > > to handle collections, I am also concerned about the > > > > > >> > > non-deterministic > > > > > >> > > > > > nature of GC timings. The KIP introduces delays in > > > > processing > > > > > >> > > requests > > > > > >> > > > > > based on the configuration parameter > *queued.max.bytes. > > > > *This > > > > > in > > > > > >> > > > > unrelated > > > > > >> > > > > > to the JVM heap size and hence pool can be full when > > there > > > > is > > > > > no > > > > > >> > > > pressure > > > > > >> > > > > > on the JVM to garbage collect. The KIP does not > prevent > > > > other > > > > > >> > > timeouts > > > > > >> > > > in > > > > > >> > > > > > the broker (eg. consumer session timeout) because it > is > > > > > relying > > > > > >> on > > > > > >> > > the > > > > > >> > > > > pool > > > > > >> > > > > > to be managed in a deterministic, timely manner. > Since a > > > > > garbage > > > > > >> > > > > collected > > > > > >> > > > > > pool cannot provide that guarantee, wouldn't it be > > better > > > to > > > > > run > > > > > >> > > tests > > > > > >> > > > > with > > > > > >> > > > > > a GC-pool that perhaps fails with a fatal error if it > > > > > encounters > > > > > >> a > > > > > >> > > > buffer > > > > > >> > > > > > that was not released? > > > > > >> > > > > > > > > > > >> > > > > > 13. The implementation currently mutes all channels > that > > > > don't > > > > > >> > have a > > > > > >> > > > > > receive buffer allocated. Would it make sense to mute > > only > > > > the > > > > > >> > > channels > > > > > >> > > > > > that need a buffer (i.e. allow channels to read the > > 4-byte > > > > > size > > > > > >> > that > > > > > >> > > is > > > > > >> > > > > not > > > > > >> > > > > > read using the pool) so that normal client connection > > > > close() > > > > > is > > > > > >> > > > handled > > > > > >> > > > > > even when the pool is full? Since the extra 4-bytes > may > > > > > already > > > > > >> be > > > > > >> > > > > > allocated for some connections, the total request > memory > > > has > > > > > to > > > > > >> > take > > > > > >> > > > into > > > > > >> > > > > > account *4*numConnections* bytes anyway. > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > On Thu, Nov 10, 2016 at 11:51 PM, Jun Rao < > > > j...@confluent.io > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > > > > > >> > > > > > > Hi, Radai, > > > > > >> > > > > > > > > > > > >> > > > > > > 1. Yes, I am concerned about the trickiness of > having > > to > > > > > deal > > > > > >> > with > > > > > >> > > > > wreak > > > > > >> > > > > > > refs. I think it's simpler to just have the simple > > > version > > > > > >> > > > instrumented > > > > > >> > > > > > > with enough debug/trace logging and do enough stress > > > > > testing. > > > > > >> > Since > > > > > >> > > > we > > > > > >> > > > > > > still have queued.max.requests, one can always fall > > back > > > > to > > > > > >> that > > > > > >> > > if a > > > > > >> > > > > > > memory leak issue is identified. We could also label > > the > > > > > >> feature > > > > > >> > as > > > > > >> > > > > beta > > > > > >> > > > > > if > > > > > >> > > > > > > we don't think this is production ready. > > > > > >> > > > > > > > > > > > >> > > > > > > 2.2 I am just wondering after we fix that issue > > whether > > > > the > > > > > >> claim > > > > > >> > > > that > > > > > >> > > > > > the > > > > > >> > > > > > > request memory is bounded by queued.max.bytes + > > > > > >> > > > > socket.request.max.bytes > > > > > >> > > > > > > is still true. > > > > > >> > > > > > > > > > > > >> > > > > > > 5. Ok, leaving the default as -1 is fine then. > > > > > >> > > > > > > > > > > > >> > > > > > > Thanks, > > > > > >> > > > > > > > > > > > >> > > > > > > Jun > > > > > >> > > > > > > > > > > > >> > > > > > > On Wed, Nov 9, 2016 at 6:01 PM, radai < > > > > > >> > radai.rosenbl...@gmail.com> > > > > > >> > > > > > wrote: > > > > > >> > > > > > > > > > > > >> > > > > > > > Hi Jun, > > > > > >> > > > > > > > > > > > > >> > > > > > > > Thank you for taking the time to review this. > > > > > >> > > > > > > > > > > > > >> > > > > > > > 1. short version - yes, the concern is bugs, but > the > > > > cost > > > > > is > > > > > >> > tiny > > > > > >> > > > and > > > > > >> > > > > > > worth > > > > > >> > > > > > > > it, and its a common pattern. long version: > > > > > >> > > > > > > > 1.1 detecting these types of bugs (leaks) > cannot > > be > > > > > easily > > > > > >> > > done > > > > > >> > > > > with > > > > > >> > > > > > > > simple testing, but requires stress/stability > tests > > > that > > > > > run > > > > > >> > for > > > > > >> > > a > > > > > >> > > > > long > > > > > >> > > > > > > > time (long enough to hit OOM, depending on leak > size > > > and > > > > > >> > > available > > > > > >> > > > > > > memory). > > > > > >> > > > > > > > this is why some sort of leak detector is > "standard > > > > > practice" > > > > > >> > > .for > > > > > >> > > > > > > example > > > > > >> > > > > > > > look at netty (http://netty.io/wiki/ > > > > > >> reference-counted-objects. > > > > > >> > > > > > > > html#leak-detection-levels) > > > > > >> > > > > > > > <http://netty.io/wiki/reference-counted-objects. > > > > > >> > > > > > > html#leak-detection-levels > > > > > >> > > > > > > > >- > > > > > >> > > > > > > > they have way more complicated built-in leak > > detection > > > > > >> enabled > > > > > >> > by > > > > > >> > > > > > > default. > > > > > >> > > > > > > > as a concrete example - during development i did > not > > > > > properly > > > > > >> > > > dispose > > > > > >> > > > > > of > > > > > >> > > > > > > > in-progress KafkaChannel.receive when a connection > > was > > > > > >> abruptly > > > > > >> > > > > closed > > > > > >> > > > > > > and > > > > > >> > > > > > > > I only found it because of the log msg printed by > > the > > > > > pool. > > > > > >> > > > > > > > 1.2 I have a benchmark suite showing the > > > performance > > > > > cost > > > > > >> of > > > > > >> > > the > > > > > >> > > > > gc > > > > > >> > > > > > > pool > > > > > >> > > > > > > > is absolutely negligible - > > > > > >> > > > > > > > https://github.com/radai- > > rosenblatt/kafka-benchmarks/ > > > > > >> > > > > > > > tree/master/memorypool-benchmarks > > > > > >> > > > > > > > 1.3 as for the complexity of the impl - its > just > > > ~150 > > > > > >> lines > > > > > >> > > and > > > > > >> > > > > > pretty > > > > > >> > > > > > > > straight forward. i think the main issue is that > not > > > > many > > > > > >> > people > > > > > >> > > > are > > > > > >> > > > > > > > familiar with weak refs and ref queues. > > > > > >> > > > > > > > > > > > > >> > > > > > > > how about making the pool impl class a config > > param > > > > > >> > (generally > > > > > >> > > > > good > > > > > >> > > > > > > > going forward), make the default be the simple > pool, > > > and > > > > > keep > > > > > >> > the > > > > > >> > > > GC > > > > > >> > > > > > one > > > > > >> > > > > > > as > > > > > >> > > > > > > > a dev/debug/triage aid? > > > > > >> > > > > > > > > > > > > >> > > > > > > > 2. the KIP itself doesnt specifically treat SSL at > > > all - > > > > > its > > > > > >> an > > > > > >> > > > > > > > implementation detail. as for my current patch, it > > has > > > > > some > > > > > >> > > minimal > > > > > >> > > > > > > > treatment of SSL - just enough to not mute SSL > > sockets > > > > > >> > > > mid-handshake > > > > > >> > > > > - > > > > > >> > > > > > > but > > > > > >> > > > > > > > the code in SslTransportLayer still allocates > > buffers > > > > > itself. > > > > > >> > it > > > > > >> > > is > > > > > >> > > > > my > > > > > >> > > > > > > > understanding that netReadBuffer/appReadBuffer > > > shouldn't > > > > > grow > > > > > >> > > > beyond > > > > > >> > > > > 2 > > > > > >> > > > > > x > > > > > >> > > > > > > > sslEngine.getSession().getPacketBufferSize(), > > which i > > > > > assume > > > > > >> > to > > > > > >> > > be > > > > > >> > > > > > > small. > > > > > >> > > > > > > > they are also long lived (they live for the > duration > > > of > > > > > the > > > > > >> > > > > connection) > > > > > >> > > > > > > > which makes a poor fit for pooling. the bigger > fish > > to > > > > > fry i > > > > > >> > > think > > > > > >> > > > is > > > > > >> > > > > > > > decompression - you could read a 1MB blob into a > > > > > >> pool-provided > > > > > >> > > > buffer > > > > > >> > > > > > and > > > > > >> > > > > > > > then decompress it into 10MB of heap allocated on > > the > > > > spot > > > > > >> :-) > > > > > >> > > > also, > > > > > >> > > > > > the > > > > > >> > > > > > > > ssl code is extremely tricky. > > > > > >> > > > > > > > 2.2 just to make sure, youre talking about > > > > > Selector.java: > > > > > >> > > while > > > > > >> > > > > > > > ((networkReceive = channel.read()) != null) > > > > > >> > > > > > addToStagedReceives(channel, > > > > > >> > > > > > > > networkReceive); ? if so youre right, and i'll fix > > > that > > > > > >> > (probably > > > > > >> > > > by > > > > > >> > > > > > > > something similar to immediatelyConnectedKeys, not > > > sure > > > > > yet) > > > > > >> > > > > > > > > > > > > >> > > > > > > > 3. isOutOfMemory is self explanatory (and i'll add > > > > > javadocs > > > > > >> and > > > > > >> > > > > update > > > > > >> > > > > > > the > > > > > >> > > > > > > > wiki). isLowOnMem is basically the point where I > > start > > > > > >> > > randomizing > > > > > >> > > > > the > > > > > >> > > > > > > > selection key handling order to avoid potential > > > > > starvation. > > > > > >> its > > > > > >> > > > > rather > > > > > >> > > > > > > > arbitrary and now that i think of it should > probably > > > not > > > > > >> exist > > > > > >> > > and > > > > > >> > > > be > > > > > >> > > > > > > > entirely contained in Selector (where the > shuffling > > > > takes > > > > > >> > place). > > > > > >> > > > > will > > > > > >> > > > > > > fix. > > > > > >> > > > > > > > > > > > > >> > > > > > > > 4. will do. > > > > > >> > > > > > > > > > > > > >> > > > > > > > 5. I prefer -1 or 0 as an explicit "OFF" (or > > basically > > > > > >> anything > > > > > >> > > > <=0). > > > > > >> > > > > > > > Long.MAX_VALUE would still create a pool, that > would > > > > still > > > > > >> > waste > > > > > >> > > > time > > > > > >> > > > > > > > tracking resources. I dont really mind though if > you > > > > have > > > > > a > > > > > >> > > > preferred > > > > > >> > > > > > > magic > > > > > >> > > > > > > > value for off. > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > On Wed, Nov 9, 2016 at 9:28 AM, Jun Rao < > > > > j...@confluent.io > > > > > > > > > > > >> > > wrote: > > > > > >> > > > > > > > > > > > > >> > > > > > > > > Hi, Radai, > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Thanks for the KIP. Some comments below. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > 1. The KIP says "to facilitate faster > > implementation > > > > > (as a > > > > > >> > > safety > > > > > >> > > > > > net) > > > > > >> > > > > > > > the > > > > > >> > > > > > > > > pool will be implemented in such a way that > memory > > > > that > > > > > was > > > > > >> > not > > > > > >> > > > > > > > release()ed > > > > > >> > > > > > > > > (but still garbage collected) would be detected > > and > > > > > >> > > "reclaimed". > > > > > >> > > > > this > > > > > >> > > > > > > is > > > > > >> > > > > > > > to > > > > > >> > > > > > > > > prevent "leaks" in case of code paths that fail > to > > > > > >> release() > > > > > >> > > > > > > properly.". > > > > > >> > > > > > > > > What are the cases that could cause memory > leaks? > > If > > > > we > > > > > are > > > > > >> > > > > concerned > > > > > >> > > > > > > > about > > > > > >> > > > > > > > > bugs, it seems that it's better to just do more > > > > testing > > > > > to > > > > > >> > make > > > > > >> > > > > sure > > > > > >> > > > > > > the > > > > > >> > > > > > > > > usage of the simple implementation > > > (SimpleMemoryPool) > > > > is > > > > > >> > solid > > > > > >> > > > > > instead > > > > > >> > > > > > > of > > > > > >> > > > > > > > > adding more complicated logic > > > > > (GarbageCollectedMemoryPool) > > > > > >> to > > > > > >> > > > hide > > > > > >> > > > > > the > > > > > >> > > > > > > > > potential bugs. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > 2. I am wondering how much this KIP covers the > SSL > > > > > channel > > > > > >> > > > > > > > implementation. > > > > > >> > > > > > > > > 2.1 SslTransportLayer maintains netReadBuffer, > > > > > >> > netWriteBuffer, > > > > > >> > > > > > > > > appReadBuffer per socket. Should those memory be > > > > > accounted > > > > > >> > for > > > > > >> > > in > > > > > >> > > > > > > memory > > > > > >> > > > > > > > > pool? > > > > > >> > > > > > > > > 2.2 One tricky thing with SSL is that during a > > > > > >> > > > KafkaChannel.read(), > > > > > >> > > > > > > it's > > > > > >> > > > > > > > > possible for multiple NetworkReceives to be > > returned > > > > > since > > > > > >> > > > multiple > > > > > >> > > > > > > > > requests' data could be encrypted together by > SSL. > > > To > > > > > deal > > > > > >> > with > > > > > >> > > > > this, > > > > > >> > > > > > > we > > > > > >> > > > > > > > > stash those NetworkReceives in > > > Selector.stagedReceives > > > > > and > > > > > >> > give > > > > > >> > > > it > > > > > >> > > > > > back > > > > > >> > > > > > > > to > > > > > >> > > > > > > > > the poll() call one NetworkReceive at a time. > What > > > > this > > > > > >> means > > > > > >> > > is > > > > > >> > > > > > that, > > > > > >> > > > > > > if > > > > > >> > > > > > > > > we stop reading from KafkaChannel in the middle > > > > because > > > > > >> > memory > > > > > >> > > > pool > > > > > >> > > > > > is > > > > > >> > > > > > > > > full, this channel's key may never get selected > > for > > > > > reads > > > > > >> > (even > > > > > >> > > > > after > > > > > >> > > > > > > the > > > > > >> > > > > > > > > read interest is turned on), but there are still > > > > pending > > > > > >> data > > > > > >> > > for > > > > > >> > > > > the > > > > > >> > > > > > > > > channel, which will never get processed. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > 3. The code has the following two methods in > > > > MemoryPool, > > > > > >> > which > > > > > >> > > > are > > > > > >> > > > > > not > > > > > >> > > > > > > > > described in the KIP. Could you explain how they > > are > > > > > used > > > > > >> in > > > > > >> > > the > > > > > >> > > > > > wiki? > > > > > >> > > > > > > > > isLowOnMemory() > > > > > >> > > > > > > > > isOutOfMemory() > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > 4. Could you also describe in the KIP at the > high > > > > level, > > > > > >> how > > > > > >> > > the > > > > > >> > > > > read > > > > > >> > > > > > > > > interest bit for the socket is turned on/off > with > > > > > respect > > > > > >> to > > > > > >> > > > > > > MemoryPool? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > 5. Should queued.max.bytes defaults to -1 or > > > > > >> Long.MAX_VALUE? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Thanks, > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Jun > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > On Mon, Nov 7, 2016 at 1:08 PM, radai < > > > > > >> > > > radai.rosenbl...@gmail.com> > > > > > >> > > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Hi, > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > I would like to initiate a vote on KIP-72: > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > https://cwiki.apache.org/ > > > > > confluence/display/KAFKA/KIP- > > > > > >> > 72%3A+ > > > > > >> > > > > > > > > > Allow+putting+a+bound+on+memor > > > > y+consumed+by+Incoming+ > > > > > >> > > requests > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > The kip allows specifying a limit on the > amount > > of > > > > > memory > > > > > >> > > > > allocated > > > > > >> > > > > > > for > > > > > >> > > > > > > > > > reading incoming requests into. This is useful > > for > > > > > >> > "sizing" a > > > > > >> > > > > > broker > > > > > >> > > > > > > > and > > > > > >> > > > > > > > > > avoiding OOMEs under heavy load (as actually > > > happens > > > > > >> > > > occasionally > > > > > >> > > > > > at > > > > > >> > > > > > > > > > linkedin). > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > I believe I've addressed most (all?) concerns > > > > brought > > > > > up > > > > > >> > > during > > > > > >> > > > > the > > > > > >> > > > > > > > > > discussion. > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > To the best of my understanding this vote is > > about > > > > the > > > > > >> goal > > > > > >> > > and > > > > > >> > > > > > > > > > public-facing changes related to the new > > proposed > > > > > >> behavior, > > > > > >> > > but > > > > > >> > > > > as > > > > > >> > > > > > > for > > > > > >> > > > > > > > > > implementation, i have the code up here: > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > https://github.com/radai- > > > > > rosenblatt/kafka/tree/broker- > > > > > >> > memory > > > > > >> > > > > > > > > > -pool-with-muting > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > and I've stress-tested it to work properly > > > (meaning > > > > it > > > > > >> > chugs > > > > > >> > > > > along > > > > > >> > > > > > > and > > > > > >> > > > > > > > > > throttles under loads that would DOS 10.0.1.0 > > > code). > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > I also believe that the primitives and > > "pattern"s > > > > > >> > introduced > > > > > >> > > in > > > > > >> > > > > > this > > > > > >> > > > > > > > KIP > > > > > >> > > > > > > > > > (namely the notion of a buffer pool and > > retrieving > > > > > from / > > > > > >> > > > > releasing > > > > > >> > > > > > > to > > > > > >> > > > > > > > > said > > > > > >> > > > > > > > > > pool instead of allocating memory) are > generally > > > > > useful > > > > > >> > > beyond > > > > > >> > > > > the > > > > > >> > > > > > > > scope > > > > > >> > > > > > > > > of > > > > > >> > > > > > > > > > this KIP for both performance issues > (allocating > > > > lots > > > > > of > > > > > >> > > > > > short-lived > > > > > >> > > > > > > > > large > > > > > >> > > > > > > > > > buffers is a performance bottleneck) and other > > > areas > > > > > >> where > > > > > >> > > > memory > > > > > >> > > > > > > > limits > > > > > >> > > > > > > > > > are a problem (KIP-81) > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Thank you, > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Radai. > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > -- > > > > > >> > > > > > Regards, > > > > > >> > > > > > > > > > > >> > > > > > Rajini > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > -- > > > > > >> > > Regards, > > > > > >> > > > > > > > >> > > Rajini > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> Regards, > > > > > >> > > > > > >> Rajini > > > > > >> > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > Rajini > > > > > > > > > >