I meant finishing what's described in the following section and then starting a discussion followed by a vote:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-DescribeConfigsRequest We have only voted on KIP-4 Metadata, KIP-4 Create Topics, KIP-4 Delete Topics so far. Ismael On Wed, Mar 15, 2017 at 8:58 PM, Becket Qin <becket....@gmail.com> wrote: > Hi Ismael, > > KIP-4 is also the one that I was thinking about. We have introduced a > DescribeConfigRequest there so the producer can easily get the > configurations. By "another KIP" do you mean a new (or maybe extended) > protocol or using that protocol in clients? > > Thanks, > > Jiangjie (Becket) Qin > > On Wed, Mar 15, 2017 at 1:21 PM, Ismael Juma <ism...@juma.me.uk> wrote: > > > Hi Becket, > > > > How were you thinking of retrieving the configuration items you > mentioned? > > I am asking because I was planning to post a KIP for Describe Configs > (one > > of the protocols in KIP-4), which would expose such information. But > maybe > > you are thinking of extending Metadata request? > > > > Ismael > > > > On Wed, Mar 15, 2017 at 7:33 PM, Becket Qin <becket....@gmail.com> > wrote: > > > > > Hi Jason, > > > > > > Good point. I was thinking about that, too. I was not sure if that is > the > > > right thing to do by default. > > > > > > If we assume people always set the batch size to max message size, > > > splitting the oversized batch makes a lot of sense. But it seems > possible > > > that users want to control the memory footprint so they would set the > > batch > > > size to smaller than the max message size so the producer can have hold > > > batches for more partitions. In this case, splitting the batch might > not > > be > > > the desired behavior. > > > > > > I think the most intuitive approach to this is allow the producer to > get > > > the max message size configuration (as well as some other > configurations > > > such as timestamp type) from the broker side and use that to decide > > > whether a batch should be split or not. I probably should add this to > the > > > KIP wiki. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Wed, Mar 15, 2017 at 9:47 AM, Jason Gustafson <ja...@confluent.io> > > > wrote: > > > > > > > Hey Becket, > > > > > > > > Thanks for the KIP! The approach seems reasonable. One clarification: > > is > > > > the intent to do the splitting after the broker rejects the request > > with > > > > MESSAGE_TOO_LARGE, or prior to sending if the configured batch size > is > > > > exceeded? > > > > > > > > -Jason > > > > > > > > On Mon, Mar 13, 2017 at 8:10 PM, Becket Qin <becket....@gmail.com> > > > wrote: > > > > > > > > > Bump up the thread for further comments. If there is no more > comments > > > on > > > > > the KIP I will start the voting thread on Wed. > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > On Tue, Mar 7, 2017 at 9:48 AM, Becket Qin <becket....@gmail.com> > > > wrote: > > > > > > > > > > > Hi Dong, > > > > > > > > > > > > Thanks for the comments. > > > > > > > > > > > > The patch is mostly for proof of concept in case there is any > > concern > > > > > > about the implementation which is indeed a little tricky. > > > > > > > > > > > > The new metric has already been mentioned in the Public Interface > > > > Change > > > > > > section. > > > > > > > > > > > > I added the reasoning about how the compression ratio > > > > > > improving/deteriorate steps are determined in the wiki. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > On Mon, Mar 6, 2017 at 4:42 PM, Dong Lin <lindon...@gmail.com> > > > wrote: > > > > > > > > > > > >> Hey Becket, > > > > > >> > > > > > >> I am wondering if we should first vote for the KIP before > > reviewing > > > > the > > > > > >> patch. I have two comments below: > > > > > >> > > > > > >> - Should we specify the new sensors as part of interface change > in > > > the > > > > > >> KIP? > > > > > >> - The KIP proposes to increase estimated compression ratio by > 0.05 > > > for > > > > > >> each > > > > > >> underestimation and decrement the estimation by 0.005 for each > > > > > >> overestimation. Why are these two values chosen? I think there > is > > > some > > > > > >> tradeoff in selecting the value. Can the KIP be more explicit > > about > > > > the > > > > > >> tradeoff and explain how these two values would impact > producer's > > > > > >> performance? > > > > > >> > > > > > >> Thanks, > > > > > >> Dong > > > > > >> > > > > > >> > > > > > >> On Sat, Mar 4, 2017 at 11:42 AM, Becket Qin < > becket....@gmail.com > > > > > > > > wrote: > > > > > >> > > > > > >> > I have updated the KIP based on the latest discussion. Please > > > check > > > > > and > > > > > >> let > > > > > >> > me know if there is any further concern. > > > > > >> > > > > > > >> > Thanks, > > > > > >> > > > > > > >> > Jiangjie (Becket) Qin > > > > > >> > > > > > > >> > On Sat, Mar 4, 2017 at 10:56 AM, Becket Qin < > > becket....@gmail.com > > > > > > > > > >> wrote: > > > > > >> > > > > > > >> > > Actually second thought on this, rate might be better for > two > > > > > reasons: > > > > > >> > > 1. Most of the metrics in the producer we already have are > > using > > > > > rate > > > > > >> > > instead of count. > > > > > >> > > 2. If a service is bounced, the count will be reset to 0, > but > > it > > > > > does > > > > > >> not > > > > > >> > > affect rate. > > > > > >> > > > > > > > >> > > I'll make the change. > > > > > >> > > > > > > > >> > > Thanks, > > > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin > > > > > >> > > > > > > > >> > > On Sat, Mar 4, 2017 at 10:27 AM, Becket Qin < > > > becket....@gmail.com > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > > >> > >> Hi Dong, > > > > > >> > >> > > > > > >> > >> Yes, there is a sensor in the patch about the split > > occurrence. > > > > > >> > >> > > > > > >> > >> Currently it is a count instead of rate. In practice, it > > seems > > > > > count > > > > > >> is > > > > > >> > >> easier to use in this case. But I am open to change. > > > > > >> > >> > > > > > >> > >> Thanks, > > > > > >> > >> > > > > > >> > >> Jiangjie (Becket) Qin > > > > > >> > >> > > > > > >> > >> On Fri, Mar 3, 2017 at 7:43 PM, Dong Lin < > > lindon...@gmail.com> > > > > > >> wrote: > > > > > >> > >> > > > > > >> > >>> Hey Becket, > > > > > >> > >>> > > > > > >> > >>> I haven't looked at the patch yet. But since we are going > to > > > try > > > > > the > > > > > >> > >>> split-on-oversize solution, should the KIP also add a > sensor > > > > that > > > > > >> shows > > > > > >> > >>> the > > > > > >> > >>> rate of split per second and the probability of split? > > > > > >> > >>> > > > > > >> > >>> Thanks, > > > > > >> > >>> Dong > > > > > >> > >>> > > > > > >> > >>> > > > > > >> > >>> On Fri, Mar 3, 2017 at 6:39 PM, Becket Qin < > > > > becket....@gmail.com> > > > > > >> > wrote: > > > > > >> > >>> > > > > > >> > >>> > Just to clarify, the implementation is basically what I > > > > > mentioned > > > > > >> > above > > > > > >> > >>> > (split/resend + adjusted estimation evolving algorithm) > > and > > > > > >> changing > > > > > >> > >>> the > > > > > >> > >>> > compression ratio estimation to be per topic. > > > > > >> > >>> > > > > > > >> > >>> > Thanks, > > > > > >> > >>> > > > > > > >> > >>> > Jiangjie (Becket) Qin > > > > > >> > >>> > > > > > > >> > >>> > On Fri, Mar 3, 2017 at 6:36 PM, Becket Qin < > > > > > becket....@gmail.com> > > > > > >> > >>> wrote: > > > > > >> > >>> > > > > > > >> > >>> > > I went ahead and have a patch submitted here: > > > > > >> > >>> > > https://github.com/apache/kafka/pull/2638 > > > > > >> > >>> > > > > > > > >> > >>> > > Per Joel's suggestion, I changed the compression ratio > > to > > > be > > > > > per > > > > > >> > >>> topic as > > > > > >> > >>> > > well. It seems working well. Since there is an > important > > > > > >> behavior > > > > > >> > >>> change > > > > > >> > >>> > > and a new sensor is added, I'll keep the KIP and > update > > it > > > > > >> > according. > > > > > >> > >>> > > > > > > > >> > >>> > > Thanks, > > > > > >> > >>> > > > > > > > >> > >>> > > Jiangjie (Becket) Qin > > > > > >> > >>> > > > > > > > >> > >>> > > On Mon, Feb 27, 2017 at 3:50 PM, Joel Koshy < > > > > > >> jjkosh...@gmail.com> > > > > > >> > >>> wrote: > > > > > >> > >>> > > > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > Lets say we sent the batch over the wire and > > received a > > > > > >> > >>> > >> > RecordTooLargeException, how do we split it as once > > we > > > > add > > > > > >> the > > > > > >> > >>> message > > > > > >> > >>> > >> to > > > > > >> > >>> > >> > the batch we loose the message level granularity. > We > > > will > > > > > >> have > > > > > >> > to > > > > > >> > >>> > >> > decompress, do deep iteration and split and again > > > > compress. > > > > > >> > right? > > > > > >> > >>> > This > > > > > >> > >>> > >> > looks like a performance bottle neck in case of > multi > > > > topic > > > > > >> > >>> producers > > > > > >> > >>> > >> like > > > > > >> > >>> > >> > mirror maker. > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > > > > > >> > >>> > >> Yes, but these should be outliers if we do estimation > > on > > > a > > > > > >> > per-topic > > > > > >> > >>> > basis > > > > > >> > >>> > >> and if we target a conservative-enough compression > > ratio. > > > > The > > > > > >> > >>> producer > > > > > >> > >>> > >> should also avoid sending over the wire if it can be > > made > > > > > >> aware of > > > > > >> > >>> the > > > > > >> > >>> > >> max-message size limit on the broker, and split if it > > > > > >> determines > > > > > >> > >>> that a > > > > > >> > >>> > >> record exceeds the broker's config. Ideally this > should > > > be > > > > > >> part of > > > > > >> > >>> topic > > > > > >> > >>> > >> metadata but is not - so it could be off a periodic > > > > > >> > describe-configs > > > > > >> > >>> > >> <https://cwiki.apache.org/ > > confluence/display/KAFKA/KIP- > > > > 4+-+ > > > > > >> > >>> > >> Command+line+and+centralized+ > > > > administrative+operations#KIP- > > > > > >> > >>> > >> 4-Commandlineandcentralizedadmin > > > > istrativeoperations-Describe > > > > > >> > >>> > >> ConfigsRequest> > > > > > >> > >>> > >> (which isn't available yet). This doesn't remove the > > need > > > > to > > > > > >> split > > > > > >> > >>> and > > > > > >> > >>> > >> recompress though. > > > > > >> > >>> > >> > > > > > >> > >>> > >> > > > > > >> > >>> > >> > On Mon, Feb 27, 2017 at 10:51 AM, Becket Qin < > > > > > >> > >>> becket....@gmail.com> > > > > > >> > >>> > >> wrote: > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > > Hey Mayuresh, > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > 1) The batch would be split when an > > > > > >> RecordTooLargeException is > > > > > >> > >>> > >> received. > > > > > >> > >>> > >> > > 2) Not lower the actual compression ratio, but > > lower > > > > the > > > > > >> > >>> estimated > > > > > >> > >>> > >> > > compression ratio "according to" the Actual > > > Compression > > > > > >> > >>> Ratio(ACR). > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > An example, let's start with Estimated > Compression > > > > Ratio > > > > > >> > (ECR) = > > > > > >> > >>> > 1.0. > > > > > >> > >>> > >> Say > > > > > >> > >>> > >> > > the compression ratio of ACR is ~0.8, instead of > > > > letting > > > > > >> the > > > > > >> > ECR > > > > > >> > >>> > >> dropped > > > > > >> > >>> > >> > to > > > > > >> > >>> > >> > > 0.8 very quickly, we only drop 0.001 every time > > when > > > > ACR > > > > > < > > > > > >> > ECR. > > > > > >> > >>> > >> However, > > > > > >> > >>> > >> > > once we see an ACR > ECR, we increment ECR by > 0.05. > > > If > > > > a > > > > > >> > >>> > >> > > RecordTooLargeException is received, we reset the > > ECR > > > > > back > > > > > >> to > > > > > >> > >>> 1.0 > > > > > >> > >>> > and > > > > > >> > >>> > >> > split > > > > > >> > >>> > >> > > the batch. > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > Thanks, > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > Jiangjie (Becket) Qin > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > On Mon, Feb 27, 2017 at 10:30 AM, Mayuresh > Gharat < > > > > > >> > >>> > >> > > gharatmayures...@gmail.com> wrote: > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > > Hi Becket, > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > Seems like an interesting idea. > > > > > >> > >>> > >> > > > I had couple of questions : > > > > > >> > >>> > >> > > > 1) How do we decide when the batch should be > > split? > > > > > >> > >>> > >> > > > 2) What do you mean by slowly lowering the > > "actual" > > > > > >> > >>> compression > > > > > >> > >>> > >> ratio? > > > > > >> > >>> > >> > > > An example would really help here. > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > Thanks, > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > Mayuresh > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin < > > > > > >> > >>> becket....@gmail.com > > > > > >> > >>> > > > > > > > >> > >>> > >> > > wrote: > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > > Hi Jay, > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > Yeah, I got your point. > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > I think there might be a solution which do > not > > > > > require > > > > > >> > >>> adding a > > > > > >> > >>> > >> new > > > > > >> > >>> > >> > > > > configuration. We can start from a very > > > > conservative > > > > > >> > >>> compression > > > > > >> > >>> > >> > ratio > > > > > >> > >>> > >> > > > say > > > > > >> > >>> > >> > > > > 1.0 and lower it very slowly according to the > > > > actual > > > > > >> > >>> compression > > > > > >> > >>> > >> > ratio > > > > > >> > >>> > >> > > > > until we hit a point that we have to split a > > > batch. > > > > > At > > > > > >> > that > > > > > >> > >>> > >> point, we > > > > > >> > >>> > >> > > > > exponentially back off on the compression > > ratio. > > > > The > > > > > >> idea > > > > > >> > is > > > > > >> > >>> > >> somewhat > > > > > >> > >>> > >> > > > like > > > > > >> > >>> > >> > > > > TCP. This should help avoid frequent split. > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > The upper bound of the batch size is also a > > > little > > > > > >> awkward > > > > > >> > >>> today > > > > > >> > >>> > >> > > because > > > > > >> > >>> > >> > > > we > > > > > >> > >>> > >> > > > > say the batch size is based on compressed > size, > > > but > > > > > >> users > > > > > >> > >>> cannot > > > > > >> > >>> > >> set > > > > > >> > >>> > >> > it > > > > > >> > >>> > >> > > > to > > > > > >> > >>> > >> > > > > the max message size because that will result > > in > > > > > >> oversized > > > > > >> > >>> > >> messages. > > > > > >> > >>> > >> > > With > > > > > >> > >>> > >> > > > > this change we will be able to allow the > users > > to > > > > set > > > > > >> the > > > > > >> > >>> > message > > > > > >> > >>> > >> > size > > > > > >> > >>> > >> > > to > > > > > >> > >>> > >> > > > > close to max message size. > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > However the downside is that there could be > > > latency > > > > > >> spikes > > > > > >> > >>> in > > > > > >> > >>> > the > > > > > >> > >>> > >> > > system > > > > > >> > >>> > >> > > > in > > > > > >> > >>> > >> > > > > this case due to the splitting, especially > when > > > > there > > > > > >> are > > > > > >> > >>> many > > > > > >> > >>> > >> > messages > > > > > >> > >>> > >> > > > > need to be split at the same time. That could > > > > > >> potentially > > > > > >> > >>> be an > > > > > >> > >>> > >> issue > > > > > >> > >>> > >> > > for > > > > > >> > >>> > >> > > > > some users. > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > What do you think about this approach? > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > Thanks, > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > Jiangjie (Becket) Qin > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > On Thu, Feb 23, 2017 at 1:31 PM, Jay Kreps < > > > > > >> > >>> j...@confluent.io> > > > > > >> > >>> > >> wrote: > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > > Hey Becket, > > > > > >> > >>> > >> > > > > > > > > > > >> > >>> > >> > > > > > Yeah that makes sense. > > > > > >> > >>> > >> > > > > > > > > > > >> > >>> > >> > > > > > I agree that you'd really have to both fix > > the > > > > > >> > estimation > > > > > >> > >>> > (i.e. > > > > > >> > >>> > >> > make > > > > > >> > >>> > >> > > it > > > > > >> > >>> > >> > > > > per > > > > > >> > >>> > >> > > > > > topic or make it better estimate the high > > > > > >> percentiles) > > > > > >> > AND > > > > > >> > >>> > have > > > > > >> > >>> > >> the > > > > > >> > >>> > >> > > > > > recovery mechanism. If you are > > underestimating > > > > > often > > > > > >> and > > > > > >> > >>> then > > > > > >> > >>> > >> > paying > > > > > >> > >>> > >> > > a > > > > > >> > >>> > >> > > > > high > > > > > >> > >>> > >> > > > > > recovery price that won't fly. > > > > > >> > >>> > >> > > > > > > > > > > >> > >>> > >> > > > > > I think you take my main point though, > which > > is > > > > > just > > > > > >> > that > > > > > >> > >>> I > > > > > >> > >>> > >> hate to > > > > > >> > >>> > >> > > > > exposes > > > > > >> > >>> > >> > > > > > these super low level options to users > > because > > > it > > > > > is > > > > > >> so > > > > > >> > >>> hard > > > > > >> > >>> > to > > > > > >> > >>> > >> > > explain > > > > > >> > >>> > >> > > > > to > > > > > >> > >>> > >> > > > > > people what it means and how they should > set > > > it. > > > > So > > > > > >> if > > > > > >> > it > > > > > >> > >>> is > > > > > >> > >>> > >> > possible > > > > > >> > >>> > >> > > > to > > > > > >> > >>> > >> > > > > > make either some combination of better > > > estimation > > > > > and > > > > > >> > >>> > splitting > > > > > >> > >>> > >> or > > > > > >> > >>> > >> > > > better > > > > > >> > >>> > >> > > > > > tolerance of overage that would be > > preferrable. > > > > > >> > >>> > >> > > > > > > > > > > >> > >>> > >> > > > > > -Jay > > > > > >> > >>> > >> > > > > > > > > > > >> > >>> > >> > > > > > On Thu, Feb 23, 2017 at 11:51 AM, Becket > Qin > > < > > > > > >> > >>> > >> becket....@gmail.com > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > > > wrote: > > > > > >> > >>> > >> > > > > > > > > > > >> > >>> > >> > > > > > > @Dong, > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > Thanks for the comments. The default > > behavior > > > > of > > > > > >> the > > > > > >> > >>> > producer > > > > > >> > >>> > >> > won't > > > > > >> > >>> > >> > > > > > change. > > > > > >> > >>> > >> > > > > > > If the users want to use the uncompressed > > > > message > > > > > >> > size, > > > > > >> > >>> they > > > > > >> > >>> > >> > > probably > > > > > >> > >>> > >> > > > > > will > > > > > >> > >>> > >> > > > > > > also bump up the batch size to somewhere > > > close > > > > to > > > > > >> the > > > > > >> > >>> max > > > > > >> > >>> > >> message > > > > > >> > >>> > >> > > > size. > > > > > >> > >>> > >> > > > > > > This would be in the document. BTW the > > > default > > > > > >> batch > > > > > >> > >>> size is > > > > > >> > >>> > >> 16K > > > > > >> > >>> > >> > > > which > > > > > >> > >>> > >> > > > > is > > > > > >> > >>> > >> > > > > > > pretty small. > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > @Jay, > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > Yeah, we actually had debated quite a bit > > > > > >> internally > > > > > >> > >>> what is > > > > > >> > >>> > >> the > > > > > >> > >>> > >> > > best > > > > > >> > >>> > >> > > > > > > solution to this. > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > I completely agree it is a bug. In > practice > > > we > > > > > >> usually > > > > > >> > >>> leave > > > > > >> > >>> > >> some > > > > > >> > >>> > >> > > > > > headroom > > > > > >> > >>> > >> > > > > > > to allow the compressed size to grow a > > little > > > > if > > > > > >> the > > > > > >> > the > > > > > >> > >>> > >> original > > > > > >> > >>> > >> > > > > > messages > > > > > >> > >>> > >> > > > > > > are not compressible, for example, 1000 > KB > > > > > instead > > > > > >> of > > > > > >> > >>> > exactly > > > > > >> > >>> > >> 1 > > > > > >> > >>> > >> > MB. > > > > > >> > >>> > >> > > > It > > > > > >> > >>> > >> > > > > is > > > > > >> > >>> > >> > > > > > > likely safe enough. > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > The major concern for the rejected > > > alternative > > > > is > > > > > >> > >>> > >> performance. It > > > > > >> > >>> > >> > > > > largely > > > > > >> > >>> > >> > > > > > > depends on how frequent we need to split > a > > > > batch, > > > > > >> i.e. > > > > > >> > >>> how > > > > > >> > >>> > >> likely > > > > > >> > >>> > >> > > the > > > > > >> > >>> > >> > > > > > > estimation can go off. If we only need to > > the > > > > > split > > > > > >> > work > > > > > >> > >>> > >> > > > occasionally, > > > > > >> > >>> > >> > > > > > the > > > > > >> > >>> > >> > > > > > > cost would be amortized so we don't need > to > > > > worry > > > > > >> > about > > > > > >> > >>> it > > > > > >> > >>> > too > > > > > >> > >>> > >> > > much. > > > > > >> > >>> > >> > > > > > > However, it looks that for a producer > with > > > > shared > > > > > >> > >>> topics, > > > > > >> > >>> > the > > > > > >> > >>> > >> > > > > estimation > > > > > >> > >>> > >> > > > > > is > > > > > >> > >>> > >> > > > > > > always off. As an example, consider two > > > topics, > > > > > one > > > > > >> > with > > > > > >> > >>> > >> > > compression > > > > > >> > >>> > >> > > > > > ratio > > > > > >> > >>> > >> > > > > > > 0.6 the other 0.2, assuming exactly same > > > > traffic, > > > > > >> the > > > > > >> > >>> > average > > > > > >> > >>> > >> > > > > compression > > > > > >> > >>> > >> > > > > > > ratio would be roughly 0.4, which is not > > > right > > > > > for > > > > > >> > >>> either of > > > > > >> > >>> > >> the > > > > > >> > >>> > >> > > > > topics. > > > > > >> > >>> > >> > > > > > So > > > > > >> > >>> > >> > > > > > > almost half of the batches (of the topics > > > with > > > > > 0.6 > > > > > >> > >>> > compression > > > > > >> > >>> > >> > > ratio) > > > > > >> > >>> > >> > > > > > will > > > > > >> > >>> > >> > > > > > > end up larger than the configured batch > > size. > > > > > When > > > > > >> it > > > > > >> > >>> comes > > > > > >> > >>> > to > > > > > >> > >>> > >> > more > > > > > >> > >>> > >> > > > > > topics > > > > > >> > >>> > >> > > > > > > such as mirror maker, this becomes more > > > > > >> unpredictable. > > > > > >> > >>> To > > > > > >> > >>> > >> avoid > > > > > >> > >>> > >> > > > > frequent > > > > > >> > >>> > >> > > > > > > rejection / split of the batches, we need > > to > > > > > >> > configured > > > > > >> > >>> the > > > > > >> > >>> > >> batch > > > > > >> > >>> > >> > > > size > > > > > >> > >>> > >> > > > > > > pretty conservatively. This could > actually > > > hurt > > > > > the > > > > > >> > >>> > >> performance > > > > > >> > >>> > >> > > > because > > > > > >> > >>> > >> > > > > > we > > > > > >> > >>> > >> > > > > > > are shoehorn the messages that are highly > > > > > >> compressible > > > > > >> > >>> to a > > > > > >> > >>> > >> small > > > > > >> > >>> > >> > > > batch > > > > > >> > >>> > >> > > > > > so > > > > > >> > >>> > >> > > > > > > that the other topics that are not that > > > > > >> compressible > > > > > >> > >>> will > > > > > >> > >>> > not > > > > > >> > >>> > >> > > become > > > > > >> > >>> > >> > > > > too > > > > > >> > >>> > >> > > > > > > large with the same batch size. At > > LinkedIn, > > > > our > > > > > >> batch > > > > > >> > >>> size > > > > > >> > >>> > is > > > > > >> > >>> > >> > > > > configured > > > > > >> > >>> > >> > > > > > > to 64 KB because of this. I think we may > > > > actually > > > > > >> have > > > > > >> > >>> > better > > > > > >> > >>> > >> > > > batching > > > > > >> > >>> > >> > > > > if > > > > > >> > >>> > >> > > > > > > we just use the uncompressed message size > > and > > > > 800 > > > > > >> KB > > > > > >> > >>> batch > > > > > >> > >>> > >> size. > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > We did not think about loosening the > > message > > > > size > > > > > >> > >>> > restriction, > > > > > >> > >>> > >> > but > > > > > >> > >>> > >> > > > that > > > > > >> > >>> > >> > > > > > > sounds a viable solution given that the > > > > consumer > > > > > >> now > > > > > >> > can > > > > > >> > >>> > fetch > > > > > >> > >>> > >> > > > > oversized > > > > > >> > >>> > >> > > > > > > messages. One concern would be that on > the > > > > broker > > > > > >> side > > > > > >> > >>> > >> oversized > > > > > >> > >>> > >> > > > > messages > > > > > >> > >>> > >> > > > > > > will bring more memory pressure. With > > KIP-92, > > > > we > > > > > >> may > > > > > >> > >>> > mitigate > > > > > >> > >>> > >> > that, > > > > > >> > >>> > >> > > > but > > > > > >> > >>> > >> > > > > > the > > > > > >> > >>> > >> > > > > > > memory allocation for large messages may > > not > > > be > > > > > >> very > > > > > >> > GC > > > > > >> > >>> > >> > friendly. I > > > > > >> > >>> > >> > > > > need > > > > > >> > >>> > >> > > > > > to > > > > > >> > >>> > >> > > > > > > think about this a little more. > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > Thanks, > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > Jiangjie (Becket) Qin > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > On Wed, Feb 22, 2017 at 8:57 PM, Jay > Kreps > > < > > > > > >> > >>> > j...@confluent.io> > > > > > >> > >>> > >> > > wrote: > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > > Hey Becket, > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > I get the problem we want to solve with > > > this, > > > > > >> but I > > > > > >> > >>> don't > > > > > >> > >>> > >> think > > > > > >> > >>> > >> > > > this > > > > > >> > >>> > >> > > > > is > > > > > >> > >>> > >> > > > > > > > something that makes sense as a user > > > > controlled > > > > > >> knob > > > > > >> > >>> that > > > > > >> > >>> > >> > > everyone > > > > > >> > >>> > >> > > > > > > sending > > > > > >> > >>> > >> > > > > > > > data to kafka has to think about. It is > > > > > >> basically a > > > > > >> > >>> bug, > > > > > >> > >>> > >> right? > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > First, as a technical question is it > true > > > > that > > > > > >> using > > > > > >> > >>> the > > > > > >> > >>> > >> > > > uncompressed > > > > > >> > >>> > >> > > > > > > size > > > > > >> > >>> > >> > > > > > > > for batching actually guarantees that > you > > > > > observe > > > > > >> > the > > > > > >> > >>> > >> limit? I > > > > > >> > >>> > >> > > > think > > > > > >> > >>> > >> > > > > > that > > > > > >> > >>> > >> > > > > > > > implies that compression always makes > the > > > > > >> messages > > > > > >> > >>> > smaller, > > > > > >> > >>> > >> > > which i > > > > > >> > >>> > >> > > > > > think > > > > > >> > >>> > >> > > > > > > > usually true but is not guaranteed, > > right? > > > > e.g. > > > > > >> if > > > > > >> > >>> someone > > > > > >> > >>> > >> > > encrypts > > > > > >> > >>> > >> > > > > > their > > > > > >> > >>> > >> > > > > > > > data which tends to randomize it and > then > > > > > enables > > > > > >> > >>> > >> > compressesion, > > > > > >> > >>> > >> > > it > > > > > >> > >>> > >> > > > > > could > > > > > >> > >>> > >> > > > > > > > slightly get bigger? > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > I also wonder if the rejected > > alternatives > > > > you > > > > > >> > >>> describe > > > > > >> > >>> > >> > couldn't > > > > > >> > >>> > >> > > be > > > > > >> > >>> > >> > > > > > made > > > > > >> > >>> > >> > > > > > > to > > > > > >> > >>> > >> > > > > > > > work: basically try to be a bit better > at > > > > > >> estimation > > > > > >> > >>> and > > > > > >> > >>> > >> > recover > > > > > >> > >>> > >> > > > when > > > > > >> > >>> > >> > > > > > we > > > > > >> > >>> > >> > > > > > > > guess wrong. I don't think the memory > > usage > > > > > >> should > > > > > >> > be > > > > > >> > >>> a > > > > > >> > >>> > >> > problem: > > > > > >> > >>> > >> > > > > isn't > > > > > >> > >>> > >> > > > > > it > > > > > >> > >>> > >> > > > > > > > the same memory usage the consumer of > > that > > > > > topic > > > > > >> > would > > > > > >> > >>> > need? > > > > > >> > >>> > >> > And > > > > > >> > >>> > >> > > > > can't > > > > > >> > >>> > >> > > > > > > you > > > > > >> > >>> > >> > > > > > > > do the splitting and recompression in a > > > > > streaming > > > > > >> > >>> fashion? > > > > > >> > >>> > >> If > > > > > >> > >>> > >> > we > > > > > >> > >>> > >> > > an > > > > > >> > >>> > >> > > > > > make > > > > > >> > >>> > >> > > > > > > > the estimation rate low and the > recovery > > > cost > > > > > is > > > > > >> > just > > > > > >> > >>> ~2x > > > > > >> > >>> > >> the > > > > > >> > >>> > >> > > > normal > > > > > >> > >>> > >> > > > > > cost > > > > > >> > >>> > >> > > > > > > > for that batch that should be totally > > fine, > > > > > >> right? > > > > > >> > >>> (It's > > > > > >> > >>> > >> > > > technically > > > > > >> > >>> > >> > > > > > true > > > > > >> > >>> > >> > > > > > > > you might have to split more than once, > > but > > > > > since > > > > > >> > you > > > > > >> > >>> > halve > > > > > >> > >>> > >> it > > > > > >> > >>> > >> > > each > > > > > >> > >>> > >> > > > > > time > > > > > >> > >>> > >> > > > > > > I > > > > > >> > >>> > >> > > > > > > > think should you get a number of > halvings > > > > that > > > > > is > > > > > >> > >>> > >> logarithmic > > > > > >> > >>> > >> > in > > > > > >> > >>> > >> > > > the > > > > > >> > >>> > >> > > > > > miss > > > > > >> > >>> > >> > > > > > > > size, which, with better estimation > you'd > > > > hope > > > > > >> would > > > > > >> > >>> be > > > > > >> > >>> > >> super > > > > > >> > >>> > >> > > duper > > > > > >> > >>> > >> > > > > > > small). > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > Alternatively maybe we could work on > the > > > > other > > > > > >> side > > > > > >> > >>> of the > > > > > >> > >>> > >> > > problem > > > > > >> > >>> > >> > > > > and > > > > > >> > >>> > >> > > > > > > try > > > > > >> > >>> > >> > > > > > > > to make it so that a small miss on > > message > > > > size > > > > > >> > isn't > > > > > >> > >>> a > > > > > >> > >>> > big > > > > > >> > >>> > >> > > > problem. > > > > > >> > >>> > >> > > > > I > > > > > >> > >>> > >> > > > > > > > think original issue was that max size > > and > > > > > fetch > > > > > >> > size > > > > > >> > >>> were > > > > > >> > >>> > >> > > tightly > > > > > >> > >>> > >> > > > > > > coupled > > > > > >> > >>> > >> > > > > > > > and the way memory in the consumer > worked > > > you > > > > > >> really > > > > > >> > >>> > wanted > > > > > >> > >>> > >> > fetch > > > > > >> > >>> > >> > > > > size > > > > > >> > >>> > >> > > > > > to > > > > > >> > >>> > >> > > > > > > > be as small as possible because you'd > use > > > > that > > > > > >> much > > > > > >> > >>> memory > > > > > >> > >>> > >> per > > > > > >> > >>> > >> > > > > fetched > > > > > >> > >>> > >> > > > > > > > partition and the consumer would get > > stuck > > > if > > > > > its > > > > > >> > >>> fetch > > > > > >> > >>> > size > > > > > >> > >>> > >> > > wasn't > > > > > >> > >>> > >> > > > > big > > > > > >> > >>> > >> > > > > > > > enough. I think we made some progress > on > > > that > > > > > >> issue > > > > > >> > >>> and > > > > > >> > >>> > >> maybe > > > > > >> > >>> > >> > > more > > > > > >> > >>> > >> > > > > > could > > > > > >> > >>> > >> > > > > > > be > > > > > >> > >>> > >> > > > > > > > done there so that a small bit of > > fuzziness > > > > > >> around > > > > > >> > the > > > > > >> > >>> > size > > > > > >> > >>> > >> > would > > > > > >> > >>> > >> > > > not > > > > > >> > >>> > >> > > > > > be > > > > > >> > >>> > >> > > > > > > an > > > > > >> > >>> > >> > > > > > > > issue? > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > -Jay > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > On Tue, Feb 21, 2017 at 12:30 PM, > Becket > > > Qin > > > > < > > > > > >> > >>> > >> > > becket....@gmail.com > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > > > wrote: > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > Hi folks, > > > > > >> > >>> > >> > > > > > > > > > > > > > >> > >>> > >> > > > > > > > > I would like to start the discussion > > > thread > > > > > on > > > > > >> > >>> KIP-126. > > > > > >> > >>> > >> The > > > > > >> > >>> > >> > KIP > > > > > >> > >>> > >> > > > > > propose > > > > > >> > >>> > >> > > > > > > > > adding a new configuration to > > > KafkaProducer > > > > > to > > > > > >> > allow > > > > > >> > >>> > >> batching > > > > > >> > >>> > >> > > > based > > > > > >> > >>> > >> > > > > > on > > > > > >> > >>> > >> > > > > > > > > uncompressed message size. > > > > > >> > >>> > >> > > > > > > > > > > > > > >> > >>> > >> > > > > > > > > Comments are welcome. > > > > > >> > >>> > >> > > > > > > > > > > > > > >> > >>> > >> > > > > > > > > The KIP wiki is following: > > > > > >> > >>> > >> > > > > > > > > https://cwiki.apache.org/confl > > > > > >> > >>> uence/display/KAFKA/KIP- > > > > > >> > >>> > >> > > > > > > > > 126+-+Allow+KafkaProducer+to+b > > > > > >> > >>> > >> atch+based+on+uncompressed+siz > > > > > >> > >>> > >> > e > > > > > >> > >>> > >> > > > > > > > > > > > > > >> > >>> > >> > > > > > > > > Thanks, > > > > > >> > >>> > >> > > > > > > > > > > > > > >> > >>> > >> > > > > > > > > Jiangjie (Becket) Qin > > > > > >> > >>> > >> > > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > >> > >>> > >> > > > > > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > -- > > > > > >> > >>> > >> > > > -Regards, > > > > > >> > >>> > >> > > > Mayuresh R. Gharat > > > > > >> > >>> > >> > > > (862) 250-7125 > > > > > >> > >>> > >> > > > > > > > > >> > >>> > >> > > > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > -- > > > > > >> > >>> > >> > -Regards, > > > > > >> > >>> > >> > Mayuresh R. Gharat > > > > > >> > >>> > >> > (862) 250-7125 > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >> > > > > > >> > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >