Thanks for the comment, Dong. I think the batch-split-ratio makes sense but is kind of redundant to batch-split-rate.
Also the batch-split-ratio may be a little more involved to make right: 1. A all-time batch split ratio is easy to get but not that useful. 2. A time-windowed batch-split-ratio is more complicated to make accurate. This is because it is kind of a "stateful" metric relies on the number of batches sent in a time window and number of batches got split in the same time window. But the sending and the splitting time are not necessarily falling in the same window. Besides, a rough estimation of the batch split ratio can be derived from the existing metrics. And I think batch-split-rate is already a good indication on whether the batch split has caused performance problem or not. So I am not sure if it is worth having an explicit batch-split-ratio metric in this case. Thanks, Jiangjie (Becket) Qin On Wed, Mar 22, 2017 at 10:54 AM, Dong Lin <lindon...@gmail.com> wrote: > Never mind about my second comment. I misunderstood the semantics of > producer's batch.size. > > On Wed, Mar 22, 2017 at 10:20 AM, Dong Lin <lindon...@gmail.com> wrote: > > > Hey Becket, > > > > In addition to the batch-split-rate, should we also add batch-split-ratio > > sensor to gauge the probability that we have to split batch? > > > > Also, in the case that the batch size configured for the producer is > > smaller than the max message size configured for the broker, why can't we > > just split the batch if its size exceeds the configured batch size? The > > benefit of this approach is that the semantics of producer is > > straightforward because we enforce the batch size that user has > configured. > > The implementation would also be simpler because we don't have to reply > on > > KIP-4 to fetch the max message size from broker. I guess you are worrying > > about the overhead of "unnecessary" split if a batch size is between > > user-configured batch size and broker's max message size. But is overhead > > really a concern? If overhead is too large because user has configured a > > very low batch size for producer, shouldn't user adjust produce config? > > > > Thanks, > > Dong > > > > On Wed, Mar 15, 2017 at 2:50 PM, Becket Qin <becket....@gmail.com> > wrote: > > > >> I see, then we are thinking about the same thing :) > >> > >> On Wed, Mar 15, 2017 at 2:26 PM, Ismael Juma <ism...@juma.me.uk> wrote: > >> > >> > I meant finishing what's described in the following section and then > >> > starting a discussion followed by a vote: > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >> > 4+-+Command+line+and+centralized+administrative+operations#KIP-4- > >> > Commandlineandcentralizedadministrativeoperations-DescribeCo > >> nfigsRequest > >> > > >> > We have only voted on KIP-4 Metadata, KIP-4 Create Topics, KIP-4 > Delete > >> > Topics so far. > >> > > >> > Ismael > >> > > >> > On Wed, Mar 15, 2017 at 8:58 PM, Becket Qin <becket....@gmail.com> > >> wrote: > >> > > >> > > Hi Ismael, > >> > > > >> > > KIP-4 is also the one that I was thinking about. We have introduced > a > >> > > DescribeConfigRequest there so the producer can easily get the > >> > > configurations. By "another KIP" do you mean a new (or maybe > extended) > >> > > protocol or using that protocol in clients? > >> > > > >> > > Thanks, > >> > > > >> > > Jiangjie (Becket) Qin > >> > > > >> > > On Wed, Mar 15, 2017 at 1:21 PM, Ismael Juma <ism...@juma.me.uk> > >> wrote: > >> > > > >> > > > Hi Becket, > >> > > > > >> > > > How were you thinking of retrieving the configuration items you > >> > > mentioned? > >> > > > I am asking because I was planning to post a KIP for Describe > >> Configs > >> > > (one > >> > > > of the protocols in KIP-4), which would expose such information. > But > >> > > maybe > >> > > > you are thinking of extending Metadata request? > >> > > > > >> > > > Ismael > >> > > > > >> > > > On Wed, Mar 15, 2017 at 7:33 PM, Becket Qin <becket....@gmail.com > > > >> > > wrote: > >> > > > > >> > > > > Hi Jason, > >> > > > > > >> > > > > Good point. I was thinking about that, too. I was not sure if > >> that is > >> > > the > >> > > > > right thing to do by default. > >> > > > > > >> > > > > If we assume people always set the batch size to max message > size, > >> > > > > splitting the oversized batch makes a lot of sense. But it seems > >> > > possible > >> > > > > that users want to control the memory footprint so they would > set > >> the > >> > > > batch > >> > > > > size to smaller than the max message size so the producer can > have > >> > hold > >> > > > > batches for more partitions. In this case, splitting the batch > >> might > >> > > not > >> > > > be > >> > > > > the desired behavior. > >> > > > > > >> > > > > I think the most intuitive approach to this is allow the > producer > >> to > >> > > get > >> > > > > the max message size configuration (as well as some other > >> > > configurations > >> > > > > such as timestamp type) from the broker side and use that to > >> decide > >> > > > > whether a batch should be split or not. I probably should add > >> this to > >> > > the > >> > > > > KIP wiki. > >> > > > > > >> > > > > Thanks, > >> > > > > > >> > > > > Jiangjie (Becket) Qin > >> > > > > > >> > > > > On Wed, Mar 15, 2017 at 9:47 AM, Jason Gustafson < > >> ja...@confluent.io > >> > > > >> > > > > wrote: > >> > > > > > >> > > > > > Hey Becket, > >> > > > > > > >> > > > > > Thanks for the KIP! The approach seems reasonable. One > >> > clarification: > >> > > > is > >> > > > > > the intent to do the splitting after the broker rejects the > >> request > >> > > > with > >> > > > > > MESSAGE_TOO_LARGE, or prior to sending if the configured batch > >> size > >> > > is > >> > > > > > exceeded? > >> > > > > > > >> > > > > > -Jason > >> > > > > > > >> > > > > > On Mon, Mar 13, 2017 at 8:10 PM, Becket Qin < > >> becket....@gmail.com> > >> > > > > wrote: > >> > > > > > > >> > > > > > > Bump up the thread for further comments. If there is no more > >> > > comments > >> > > > > on > >> > > > > > > the KIP I will start the voting thread on Wed. > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > > >> > > > > > > Jiangjie (Becket) Qin > >> > > > > > > > >> > > > > > > On Tue, Mar 7, 2017 at 9:48 AM, Becket Qin < > >> becket....@gmail.com > >> > > > >> > > > > wrote: > >> > > > > > > > >> > > > > > > > Hi Dong, > >> > > > > > > > > >> > > > > > > > Thanks for the comments. > >> > > > > > > > > >> > > > > > > > The patch is mostly for proof of concept in case there is > >> any > >> > > > concern > >> > > > > > > > about the implementation which is indeed a little tricky. > >> > > > > > > > > >> > > > > > > > The new metric has already been mentioned in the Public > >> > Interface > >> > > > > > Change > >> > > > > > > > section. > >> > > > > > > > > >> > > > > > > > I added the reasoning about how the compression ratio > >> > > > > > > > improving/deteriorate steps are determined in the wiki. > >> > > > > > > > > >> > > > > > > > Thanks, > >> > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > On Mon, Mar 6, 2017 at 4:42 PM, Dong Lin < > >> lindon...@gmail.com> > >> > > > > wrote: > >> > > > > > > > > >> > > > > > > >> Hey Becket, > >> > > > > > > >> > >> > > > > > > >> I am wondering if we should first vote for the KIP before > >> > > > reviewing > >> > > > > > the > >> > > > > > > >> patch. I have two comments below: > >> > > > > > > >> > >> > > > > > > >> - Should we specify the new sensors as part of interface > >> > change > >> > > in > >> > > > > the > >> > > > > > > >> KIP? > >> > > > > > > >> - The KIP proposes to increase estimated compression > ratio > >> by > >> > > 0.05 > >> > > > > for > >> > > > > > > >> each > >> > > > > > > >> underestimation and decrement the estimation by 0.005 for > >> each > >> > > > > > > >> overestimation. Why are these two values chosen? I think > >> there > >> > > is > >> > > > > some > >> > > > > > > >> tradeoff in selecting the value. Can the KIP be more > >> explicit > >> > > > about > >> > > > > > the > >> > > > > > > >> tradeoff and explain how these two values would impact > >> > > producer's > >> > > > > > > >> performance? > >> > > > > > > >> > >> > > > > > > >> Thanks, > >> > > > > > > >> Dong > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> On Sat, Mar 4, 2017 at 11:42 AM, Becket Qin < > >> > > becket....@gmail.com > >> > > > > > >> > > > > > > wrote: > >> > > > > > > >> > >> > > > > > > >> > I have updated the KIP based on the latest discussion. > >> > Please > >> > > > > check > >> > > > > > > and > >> > > > > > > >> let > >> > > > > > > >> > me know if there is any further concern. > >> > > > > > > >> > > >> > > > > > > >> > Thanks, > >> > > > > > > >> > > >> > > > > > > >> > Jiangjie (Becket) Qin > >> > > > > > > >> > > >> > > > > > > >> > On Sat, Mar 4, 2017 at 10:56 AM, Becket Qin < > >> > > > becket....@gmail.com > >> > > > > > > >> > > > > > > >> wrote: > >> > > > > > > >> > > >> > > > > > > >> > > Actually second thought on this, rate might be better > >> for > >> > > two > >> > > > > > > reasons: > >> > > > > > > >> > > 1. Most of the metrics in the producer we already > have > >> are > >> > > > using > >> > > > > > > rate > >> > > > > > > >> > > instead of count. > >> > > > > > > >> > > 2. If a service is bounced, the count will be reset > to > >> 0, > >> > > but > >> > > > it > >> > > > > > > does > >> > > > > > > >> not > >> > > > > > > >> > > affect rate. > >> > > > > > > >> > > > >> > > > > > > >> > > I'll make the change. > >> > > > > > > >> > > > >> > > > > > > >> > > Thanks, > >> > > > > > > >> > > > >> > > > > > > >> > > Jiangjie (Becket) Qin > >> > > > > > > >> > > > >> > > > > > > >> > > On Sat, Mar 4, 2017 at 10:27 AM, Becket Qin < > >> > > > > becket....@gmail.com > >> > > > > > > > >> > > > > > > >> > wrote: > >> > > > > > > >> > > > >> > > > > > > >> > >> Hi Dong, > >> > > > > > > >> > >> > >> > > > > > > >> > >> Yes, there is a sensor in the patch about the split > >> > > > occurrence. > >> > > > > > > >> > >> > >> > > > > > > >> > >> Currently it is a count instead of rate. In > practice, > >> it > >> > > > seems > >> > > > > > > count > >> > > > > > > >> is > >> > > > > > > >> > >> easier to use in this case. But I am open to change. > >> > > > > > > >> > >> > >> > > > > > > >> > >> Thanks, > >> > > > > > > >> > >> > >> > > > > > > >> > >> Jiangjie (Becket) Qin > >> > > > > > > >> > >> > >> > > > > > > >> > >> On Fri, Mar 3, 2017 at 7:43 PM, Dong Lin < > >> > > > lindon...@gmail.com> > >> > > > > > > >> wrote: > >> > > > > > > >> > >> > >> > > > > > > >> > >>> Hey Becket, > >> > > > > > > >> > >>> > >> > > > > > > >> > >>> I haven't looked at the patch yet. But since we are > >> > going > >> > > to > >> > > > > try > >> > > > > > > the > >> > > > > > > >> > >>> split-on-oversize solution, should the KIP also > add a > >> > > sensor > >> > > > > > that > >> > > > > > > >> shows > >> > > > > > > >> > >>> the > >> > > > > > > >> > >>> rate of split per second and the probability of > >> split? > >> > > > > > > >> > >>> > >> > > > > > > >> > >>> Thanks, > >> > > > > > > >> > >>> Dong > >> > > > > > > >> > >>> > >> > > > > > > >> > >>> > >> > > > > > > >> > >>> On Fri, Mar 3, 2017 at 6:39 PM, Becket Qin < > >> > > > > > becket....@gmail.com> > >> > > > > > > >> > wrote: > >> > > > > > > >> > >>> > >> > > > > > > >> > >>> > Just to clarify, the implementation is basically > >> what > >> > I > >> > > > > > > mentioned > >> > > > > > > >> > above > >> > > > > > > >> > >>> > (split/resend + adjusted estimation evolving > >> > algorithm) > >> > > > and > >> > > > > > > >> changing > >> > > > > > > >> > >>> the > >> > > > > > > >> > >>> > compression ratio estimation to be per topic. > >> > > > > > > >> > >>> > > >> > > > > > > >> > >>> > Thanks, > >> > > > > > > >> > >>> > > >> > > > > > > >> > >>> > Jiangjie (Becket) Qin > >> > > > > > > >> > >>> > > >> > > > > > > >> > >>> > On Fri, Mar 3, 2017 at 6:36 PM, Becket Qin < > >> > > > > > > becket....@gmail.com> > >> > > > > > > >> > >>> wrote: > >> > > > > > > >> > >>> > > >> > > > > > > >> > >>> > > I went ahead and have a patch submitted here: > >> > > > > > > >> > >>> > > https://github.com/apache/kafka/pull/2638 > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > > Per Joel's suggestion, I changed the > compression > >> > ratio > >> > > > to > >> > > > > be > >> > > > > > > per > >> > > > > > > >> > >>> topic as > >> > > > > > > >> > >>> > > well. It seems working well. Since there is an > >> > > important > >> > > > > > > >> behavior > >> > > > > > > >> > >>> change > >> > > > > > > >> > >>> > > and a new sensor is added, I'll keep the KIP > and > >> > > update > >> > > > it > >> > > > > > > >> > according. > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > > Thanks, > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > > Jiangjie (Becket) Qin > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > > On Mon, Feb 27, 2017 at 3:50 PM, Joel Koshy < > >> > > > > > > >> jjkosh...@gmail.com> > >> > > > > > > >> > >>> wrote: > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > >> > > >> > > > > > > >> > >>> > >> > Lets say we sent the batch over the wire and > >> > > > received a > >> > > > > > > >> > >>> > >> > RecordTooLargeException, how do we split it > as > >> > once > >> > > > we > >> > > > > > add > >> > > > > > > >> the > >> > > > > > > >> > >>> message > >> > > > > > > >> > >>> > >> to > >> > > > > > > >> > >>> > >> > the batch we loose the message level > >> granularity. > >> > > We > >> > > > > will > >> > > > > > > >> have > >> > > > > > > >> > to > >> > > > > > > >> > >>> > >> > decompress, do deep iteration and split and > >> again > >> > > > > > compress. > >> > > > > > > >> > right? > >> > > > > > > >> > >>> > This > >> > > > > > > >> > >>> > >> > looks like a performance bottle neck in case > >> of > >> > > multi > >> > > > > > topic > >> > > > > > > >> > >>> producers > >> > > > > > > >> > >>> > >> like > >> > > > > > > >> > >>> > >> > mirror maker. > >> > > > > > > >> > >>> > >> > > >> > > > > > > >> > >>> > >> > >> > > > > > > >> > >>> > >> Yes, but these should be outliers if we do > >> > estimation > >> > > > on > >> > > > > a > >> > > > > > > >> > per-topic > >> > > > > > > >> > >>> > basis > >> > > > > > > >> > >>> > >> and if we target a conservative-enough > >> compression > >> > > > ratio. > >> > > > > > The > >> > > > > > > >> > >>> producer > >> > > > > > > >> > >>> > >> should also avoid sending over the wire if it > >> can > >> > be > >> > > > made > >> > > > > > > >> aware of > >> > > > > > > >> > >>> the > >> > > > > > > >> > >>> > >> max-message size limit on the broker, and > split > >> if > >> > it > >> > > > > > > >> determines > >> > > > > > > >> > >>> that a > >> > > > > > > >> > >>> > >> record exceeds the broker's config. Ideally > this > >> > > should > >> > > > > be > >> > > > > > > >> part of > >> > > > > > > >> > >>> topic > >> > > > > > > >> > >>> > >> metadata but is not - so it could be off a > >> periodic > >> > > > > > > >> > describe-configs > >> > > > > > > >> > >>> > >> <https://cwiki.apache.org/ > >> > > > confluence/display/KAFKA/KIP- > >> > > > > > 4+-+ > >> > > > > > > >> > >>> > >> Command+line+and+centralized+ > >> > > > > > administrative+operations#KIP- > >> > > > > > > >> > >>> > >> 4-Commandlineandcentralizedadmin > >> > > > > > istrativeoperations-Describe > >> > > > > > > >> > >>> > >> ConfigsRequest> > >> > > > > > > >> > >>> > >> (which isn't available yet). This doesn't > remove > >> > the > >> > > > need > >> > > > > > to > >> > > > > > > >> split > >> > > > > > > >> > >>> and > >> > > > > > > >> > >>> > >> recompress though. > >> > > > > > > >> > >>> > >> > >> > > > > > > >> > >>> > >> > >> > > > > > > >> > >>> > >> > On Mon, Feb 27, 2017 at 10:51 AM, Becket > Qin < > >> > > > > > > >> > >>> becket....@gmail.com> > >> > > > > > > >> > >>> > >> wrote: > >> > > > > > > >> > >>> > >> > > >> > > > > > > >> > >>> > >> > > Hey Mayuresh, > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > 1) The batch would be split when an > >> > > > > > > >> RecordTooLargeException is > >> > > > > > > >> > >>> > >> received. > >> > > > > > > >> > >>> > >> > > 2) Not lower the actual compression ratio, > >> but > >> > > > lower > >> > > > > > the > >> > > > > > > >> > >>> estimated > >> > > > > > > >> > >>> > >> > > compression ratio "according to" the > Actual > >> > > > > Compression > >> > > > > > > >> > >>> Ratio(ACR). > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > An example, let's start with Estimated > >> > > Compression > >> > > > > > Ratio > >> > > > > > > >> > (ECR) = > >> > > > > > > >> > >>> > 1.0. > >> > > > > > > >> > >>> > >> Say > >> > > > > > > >> > >>> > >> > > the compression ratio of ACR is ~0.8, > >> instead > >> > of > >> > > > > > letting > >> > > > > > > >> the > >> > > > > > > >> > ECR > >> > > > > > > >> > >>> > >> dropped > >> > > > > > > >> > >>> > >> > to > >> > > > > > > >> > >>> > >> > > 0.8 very quickly, we only drop 0.001 every > >> time > >> > > > when > >> > > > > > ACR > >> > > > > > > < > >> > > > > > > >> > ECR. > >> > > > > > > >> > >>> > >> However, > >> > > > > > > >> > >>> > >> > > once we see an ACR > ECR, we increment ECR > >> by > >> > > 0.05. > >> > > > > If > >> > > > > > a > >> > > > > > > >> > >>> > >> > > RecordTooLargeException is received, we > >> reset > >> > the > >> > > > ECR > >> > > > > > > back > >> > > > > > > >> to > >> > > > > > > >> > >>> 1.0 > >> > > > > > > >> > >>> > and > >> > > > > > > >> > >>> > >> > split > >> > > > > > > >> > >>> > >> > > the batch. > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > Thanks, > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > Jiangjie (Becket) Qin > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > On Mon, Feb 27, 2017 at 10:30 AM, Mayuresh > >> > > Gharat < > >> > > > > > > >> > >>> > >> > > gharatmayures...@gmail.com> wrote: > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > > Hi Becket, > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > Seems like an interesting idea. > >> > > > > > > >> > >>> > >> > > > I had couple of questions : > >> > > > > > > >> > >>> > >> > > > 1) How do we decide when the batch > should > >> be > >> > > > split? > >> > > > > > > >> > >>> > >> > > > 2) What do you mean by slowly lowering > the > >> > > > "actual" > >> > > > > > > >> > >>> compression > >> > > > > > > >> > >>> > >> ratio? > >> > > > > > > >> > >>> > >> > > > An example would really help here. > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > Thanks, > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > Mayuresh > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > On Fri, Feb 24, 2017 at 3:17 PM, Becket > >> Qin < > >> > > > > > > >> > >>> becket....@gmail.com > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > >> > > wrote: > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > > Hi Jay, > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > Yeah, I got your point. > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > I think there might be a solution > which > >> do > >> > > not > >> > > > > > > require > >> > > > > > > >> > >>> adding a > >> > > > > > > >> > >>> > >> new > >> > > > > > > >> > >>> > >> > > > > configuration. We can start from a > very > >> > > > > > conservative > >> > > > > > > >> > >>> compression > >> > > > > > > >> > >>> > >> > ratio > >> > > > > > > >> > >>> > >> > > > say > >> > > > > > > >> > >>> > >> > > > > 1.0 and lower it very slowly according > >> to > >> > the > >> > > > > > actual > >> > > > > > > >> > >>> compression > >> > > > > > > >> > >>> > >> > ratio > >> > > > > > > >> > >>> > >> > > > > until we hit a point that we have to > >> split > >> > a > >> > > > > batch. > >> > > > > > > At > >> > > > > > > >> > that > >> > > > > > > >> > >>> > >> point, we > >> > > > > > > >> > >>> > >> > > > > exponentially back off on the > >> compression > >> > > > ratio. > >> > > > > > The > >> > > > > > > >> idea > >> > > > > > > >> > is > >> > > > > > > >> > >>> > >> somewhat > >> > > > > > > >> > >>> > >> > > > like > >> > > > > > > >> > >>> > >> > > > > TCP. This should help avoid frequent > >> split. > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > The upper bound of the batch size is > >> also a > >> > > > > little > >> > > > > > > >> awkward > >> > > > > > > >> > >>> today > >> > > > > > > >> > >>> > >> > > because > >> > > > > > > >> > >>> > >> > > > we > >> > > > > > > >> > >>> > >> > > > > say the batch size is based on > >> compressed > >> > > size, > >> > > > > but > >> > > > > > > >> users > >> > > > > > > >> > >>> cannot > >> > > > > > > >> > >>> > >> set > >> > > > > > > >> > >>> > >> > it > >> > > > > > > >> > >>> > >> > > > to > >> > > > > > > >> > >>> > >> > > > > the max message size because that will > >> > result > >> > > > in > >> > > > > > > >> oversized > >> > > > > > > >> > >>> > >> messages. > >> > > > > > > >> > >>> > >> > > With > >> > > > > > > >> > >>> > >> > > > > this change we will be able to allow > the > >> > > users > >> > > > to > >> > > > > > set > >> > > > > > > >> the > >> > > > > > > >> > >>> > message > >> > > > > > > >> > >>> > >> > size > >> > > > > > > >> > >>> > >> > > to > >> > > > > > > >> > >>> > >> > > > > close to max message size. > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > However the downside is that there > >> could be > >> > > > > latency > >> > > > > > > >> spikes > >> > > > > > > >> > >>> in > >> > > > > > > >> > >>> > the > >> > > > > > > >> > >>> > >> > > system > >> > > > > > > >> > >>> > >> > > > in > >> > > > > > > >> > >>> > >> > > > > this case due to the splitting, > >> especially > >> > > when > >> > > > > > there > >> > > > > > > >> are > >> > > > > > > >> > >>> many > >> > > > > > > >> > >>> > >> > messages > >> > > > > > > >> > >>> > >> > > > > need to be split at the same time. > That > >> > could > >> > > > > > > >> potentially > >> > > > > > > >> > >>> be an > >> > > > > > > >> > >>> > >> issue > >> > > > > > > >> > >>> > >> > > for > >> > > > > > > >> > >>> > >> > > > > some users. > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > What do you think about this approach? > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > Thanks, > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > Jiangjie (Becket) Qin > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > On Thu, Feb 23, 2017 at 1:31 PM, Jay > >> Kreps > >> > < > >> > > > > > > >> > >>> j...@confluent.io> > >> > > > > > > >> > >>> > >> wrote: > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > > Hey Becket, > >> > > > > > > >> > >>> > >> > > > > > > >> > > > > > > >> > >>> > >> > > > > > Yeah that makes sense. > >> > > > > > > >> > >>> > >> > > > > > > >> > > > > > > >> > >>> > >> > > > > > I agree that you'd really have to > both > >> > fix > >> > > > the > >> > > > > > > >> > estimation > >> > > > > > > >> > >>> > (i.e. > >> > > > > > > >> > >>> > >> > make > >> > > > > > > >> > >>> > >> > > it > >> > > > > > > >> > >>> > >> > > > > per > >> > > > > > > >> > >>> > >> > > > > > topic or make it better estimate the > >> high > >> > > > > > > >> percentiles) > >> > > > > > > >> > AND > >> > > > > > > >> > >>> > have > >> > > > > > > >> > >>> > >> the > >> > > > > > > >> > >>> > >> > > > > > recovery mechanism. If you are > >> > > > underestimating > >> > > > > > > often > >> > > > > > > >> and > >> > > > > > > >> > >>> then > >> > > > > > > >> > >>> > >> > paying > >> > > > > > > >> > >>> > >> > > a > >> > > > > > > >> > >>> > >> > > > > high > >> > > > > > > >> > >>> > >> > > > > > recovery price that won't fly. > >> > > > > > > >> > >>> > >> > > > > > > >> > > > > > > >> > >>> > >> > > > > > I think you take my main point > though, > >> > > which > >> > > > is > >> > > > > > > just > >> > > > > > > >> > that > >> > > > > > > >> > >>> I > >> > > > > > > >> > >>> > >> hate to > >> > > > > > > >> > >>> > >> > > > > exposes > >> > > > > > > >> > >>> > >> > > > > > these super low level options to > users > >> > > > because > >> > > > > it > >> > > > > > > is > >> > > > > > > >> so > >> > > > > > > >> > >>> hard > >> > > > > > > >> > >>> > to > >> > > > > > > >> > >>> > >> > > explain > >> > > > > > > >> > >>> > >> > > > > to > >> > > > > > > >> > >>> > >> > > > > > people what it means and how they > >> should > >> > > set > >> > > > > it. > >> > > > > > So > >> > > > > > > >> if > >> > > > > > > >> > it > >> > > > > > > >> > >>> is > >> > > > > > > >> > >>> > >> > possible > >> > > > > > > >> > >>> > >> > > > to > >> > > > > > > >> > >>> > >> > > > > > make either some combination of > better > >> > > > > estimation > >> > > > > > > and > >> > > > > > > >> > >>> > splitting > >> > > > > > > >> > >>> > >> or > >> > > > > > > >> > >>> > >> > > > better > >> > > > > > > >> > >>> > >> > > > > > tolerance of overage that would be > >> > > > preferrable. > >> > > > > > > >> > >>> > >> > > > > > > >> > > > > > > >> > >>> > >> > > > > > -Jay > >> > > > > > > >> > >>> > >> > > > > > > >> > > > > > > >> > >>> > >> > > > > > On Thu, Feb 23, 2017 at 11:51 AM, > >> Becket > >> > > Qin > >> > > > < > >> > > > > > > >> > >>> > >> becket....@gmail.com > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > > > wrote: > >> > > > > > > >> > >>> > >> > > > > > > >> > > > > > > >> > >>> > >> > > > > > > @Dong, > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > Thanks for the comments. The > default > >> > > > behavior > >> > > > > > of > >> > > > > > > >> the > >> > > > > > > >> > >>> > producer > >> > > > > > > >> > >>> > >> > won't > >> > > > > > > >> > >>> > >> > > > > > change. > >> > > > > > > >> > >>> > >> > > > > > > If the users want to use the > >> > uncompressed > >> > > > > > message > >> > > > > > > >> > size, > >> > > > > > > >> > >>> they > >> > > > > > > >> > >>> > >> > > probably > >> > > > > > > >> > >>> > >> > > > > > will > >> > > > > > > >> > >>> > >> > > > > > > also bump up the batch size to > >> > somewhere > >> > > > > close > >> > > > > > to > >> > > > > > > >> the > >> > > > > > > >> > >>> max > >> > > > > > > >> > >>> > >> message > >> > > > > > > >> > >>> > >> > > > size. > >> > > > > > > >> > >>> > >> > > > > > > This would be in the document. BTW > >> the > >> > > > > default > >> > > > > > > >> batch > >> > > > > > > >> > >>> size is > >> > > > > > > >> > >>> > >> 16K > >> > > > > > > >> > >>> > >> > > > which > >> > > > > > > >> > >>> > >> > > > > is > >> > > > > > > >> > >>> > >> > > > > > > pretty small. > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > @Jay, > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > Yeah, we actually had debated > quite > >> a > >> > bit > >> > > > > > > >> internally > >> > > > > > > >> > >>> what is > >> > > > > > > >> > >>> > >> the > >> > > > > > > >> > >>> > >> > > best > >> > > > > > > >> > >>> > >> > > > > > > solution to this. > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > I completely agree it is a bug. In > >> > > practice > >> > > > > we > >> > > > > > > >> usually > >> > > > > > > >> > >>> leave > >> > > > > > > >> > >>> > >> some > >> > > > > > > >> > >>> > >> > > > > > headroom > >> > > > > > > >> > >>> > >> > > > > > > to allow the compressed size to > >> grow a > >> > > > little > >> > > > > > if > >> > > > > > > >> the > >> > > > > > > >> > the > >> > > > > > > >> > >>> > >> original > >> > > > > > > >> > >>> > >> > > > > > messages > >> > > > > > > >> > >>> > >> > > > > > > are not compressible, for example, > >> 1000 > >> > > KB > >> > > > > > > instead > >> > > > > > > >> of > >> > > > > > > >> > >>> > exactly > >> > > > > > > >> > >>> > >> 1 > >> > > > > > > >> > >>> > >> > MB. > >> > > > > > > >> > >>> > >> > > > It > >> > > > > > > >> > >>> > >> > > > > is > >> > > > > > > >> > >>> > >> > > > > > > likely safe enough. > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > The major concern for the rejected > >> > > > > alternative > >> > > > > > is > >> > > > > > > >> > >>> > >> performance. It > >> > > > > > > >> > >>> > >> > > > > largely > >> > > > > > > >> > >>> > >> > > > > > > depends on how frequent we need to > >> > split > >> > > a > >> > > > > > batch, > >> > > > > > > >> i.e. > >> > > > > > > >> > >>> how > >> > > > > > > >> > >>> > >> likely > >> > > > > > > >> > >>> > >> > > the > >> > > > > > > >> > >>> > >> > > > > > > estimation can go off. If we only > >> need > >> > to > >> > > > the > >> > > > > > > split > >> > > > > > > >> > work > >> > > > > > > >> > >>> > >> > > > occasionally, > >> > > > > > > >> > >>> > >> > > > > > the > >> > > > > > > >> > >>> > >> > > > > > > cost would be amortized so we > don't > >> > need > >> > > to > >> > > > > > worry > >> > > > > > > >> > about > >> > > > > > > >> > >>> it > >> > > > > > > >> > >>> > too > >> > > > > > > >> > >>> > >> > > much. > >> > > > > > > >> > >>> > >> > > > > > > However, it looks that for a > >> producer > >> > > with > >> > > > > > shared > >> > > > > > > >> > >>> topics, > >> > > > > > > >> > >>> > the > >> > > > > > > >> > >>> > >> > > > > estimation > >> > > > > > > >> > >>> > >> > > > > > is > >> > > > > > > >> > >>> > >> > > > > > > always off. As an example, > consider > >> two > >> > > > > topics, > >> > > > > > > one > >> > > > > > > >> > with > >> > > > > > > >> > >>> > >> > > compression > >> > > > > > > >> > >>> > >> > > > > > ratio > >> > > > > > > >> > >>> > >> > > > > > > 0.6 the other 0.2, assuming > exactly > >> > same > >> > > > > > traffic, > >> > > > > > > >> the > >> > > > > > > >> > >>> > average > >> > > > > > > >> > >>> > >> > > > > compression > >> > > > > > > >> > >>> > >> > > > > > > ratio would be roughly 0.4, which > is > >> > not > >> > > > > right > >> > > > > > > for > >> > > > > > > >> > >>> either of > >> > > > > > > >> > >>> > >> the > >> > > > > > > >> > >>> > >> > > > > topics. > >> > > > > > > >> > >>> > >> > > > > > So > >> > > > > > > >> > >>> > >> > > > > > > almost half of the batches (of the > >> > topics > >> > > > > with > >> > > > > > > 0.6 > >> > > > > > > >> > >>> > compression > >> > > > > > > >> > >>> > >> > > ratio) > >> > > > > > > >> > >>> > >> > > > > > will > >> > > > > > > >> > >>> > >> > > > > > > end up larger than the configured > >> batch > >> > > > size. > >> > > > > > > When > >> > > > > > > >> it > >> > > > > > > >> > >>> comes > >> > > > > > > >> > >>> > to > >> > > > > > > >> > >>> > >> > more > >> > > > > > > >> > >>> > >> > > > > > topics > >> > > > > > > >> > >>> > >> > > > > > > such as mirror maker, this becomes > >> more > >> > > > > > > >> unpredictable. > >> > > > > > > >> > >>> To > >> > > > > > > >> > >>> > >> avoid > >> > > > > > > >> > >>> > >> > > > > frequent > >> > > > > > > >> > >>> > >> > > > > > > rejection / split of the batches, > we > >> > need > >> > > > to > >> > > > > > > >> > configured > >> > > > > > > >> > >>> the > >> > > > > > > >> > >>> > >> batch > >> > > > > > > >> > >>> > >> > > > size > >> > > > > > > >> > >>> > >> > > > > > > pretty conservatively. This could > >> > > actually > >> > > > > hurt > >> > > > > > > the > >> > > > > > > >> > >>> > >> performance > >> > > > > > > >> > >>> > >> > > > because > >> > > > > > > >> > >>> > >> > > > > > we > >> > > > > > > >> > >>> > >> > > > > > > are shoehorn the messages that are > >> > highly > >> > > > > > > >> compressible > >> > > > > > > >> > >>> to a > >> > > > > > > >> > >>> > >> small > >> > > > > > > >> > >>> > >> > > > batch > >> > > > > > > >> > >>> > >> > > > > > so > >> > > > > > > >> > >>> > >> > > > > > > that the other topics that are not > >> that > >> > > > > > > >> compressible > >> > > > > > > >> > >>> will > >> > > > > > > >> > >>> > not > >> > > > > > > >> > >>> > >> > > become > >> > > > > > > >> > >>> > >> > > > > too > >> > > > > > > >> > >>> > >> > > > > > > large with the same batch size. At > >> > > > LinkedIn, > >> > > > > > our > >> > > > > > > >> batch > >> > > > > > > >> > >>> size > >> > > > > > > >> > >>> > is > >> > > > > > > >> > >>> > >> > > > > configured > >> > > > > > > >> > >>> > >> > > > > > > to 64 KB because of this. I think > we > >> > may > >> > > > > > actually > >> > > > > > > >> have > >> > > > > > > >> > >>> > better > >> > > > > > > >> > >>> > >> > > > batching > >> > > > > > > >> > >>> > >> > > > > if > >> > > > > > > >> > >>> > >> > > > > > > we just use the uncompressed > message > >> > size > >> > > > and > >> > > > > > 800 > >> > > > > > > >> KB > >> > > > > > > >> > >>> batch > >> > > > > > > >> > >>> > >> size. > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > We did not think about loosening > the > >> > > > message > >> > > > > > size > >> > > > > > > >> > >>> > restriction, > >> > > > > > > >> > >>> > >> > but > >> > > > > > > >> > >>> > >> > > > that > >> > > > > > > >> > >>> > >> > > > > > > sounds a viable solution given > that > >> the > >> > > > > > consumer > >> > > > > > > >> now > >> > > > > > > >> > can > >> > > > > > > >> > >>> > fetch > >> > > > > > > >> > >>> > >> > > > > oversized > >> > > > > > > >> > >>> > >> > > > > > > messages. One concern would be > that > >> on > >> > > the > >> > > > > > broker > >> > > > > > > >> side > >> > > > > > > >> > >>> > >> oversized > >> > > > > > > >> > >>> > >> > > > > messages > >> > > > > > > >> > >>> > >> > > > > > > will bring more memory pressure. > >> With > >> > > > KIP-92, > >> > > > > > we > >> > > > > > > >> may > >> > > > > > > >> > >>> > mitigate > >> > > > > > > >> > >>> > >> > that, > >> > > > > > > >> > >>> > >> > > > but > >> > > > > > > >> > >>> > >> > > > > > the > >> > > > > > > >> > >>> > >> > > > > > > memory allocation for large > messages > >> > may > >> > > > not > >> > > > > be > >> > > > > > > >> very > >> > > > > > > >> > GC > >> > > > > > > >> > >>> > >> > friendly. I > >> > > > > > > >> > >>> > >> > > > > need > >> > > > > > > >> > >>> > >> > > > > > to > >> > > > > > > >> > >>> > >> > > > > > > think about this a little more. > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > Thanks, > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > Jiangjie (Becket) Qin > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > On Wed, Feb 22, 2017 at 8:57 PM, > Jay > >> > > Kreps > >> > > > < > >> > > > > > > >> > >>> > j...@confluent.io> > >> > > > > > > >> > >>> > >> > > wrote: > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > Hey Becket, > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > I get the problem we want to > solve > >> > with > >> > > > > this, > >> > > > > > > >> but I > >> > > > > > > >> > >>> don't > >> > > > > > > >> > >>> > >> think > >> > > > > > > >> > >>> > >> > > > this > >> > > > > > > >> > >>> > >> > > > > is > >> > > > > > > >> > >>> > >> > > > > > > > something that makes sense as a > >> user > >> > > > > > controlled > >> > > > > > > >> knob > >> > > > > > > >> > >>> that > >> > > > > > > >> > >>> > >> > > everyone > >> > > > > > > >> > >>> > >> > > > > > > sending > >> > > > > > > >> > >>> > >> > > > > > > > data to kafka has to think > about. > >> It > >> > is > >> > > > > > > >> basically a > >> > > > > > > >> > >>> bug, > >> > > > > > > >> > >>> > >> right? > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > First, as a technical question > is > >> it > >> > > true > >> > > > > > that > >> > > > > > > >> using > >> > > > > > > >> > >>> the > >> > > > > > > >> > >>> > >> > > > uncompressed > >> > > > > > > >> > >>> > >> > > > > > > size > >> > > > > > > >> > >>> > >> > > > > > > > for batching actually guarantees > >> that > >> > > you > >> > > > > > > observe > >> > > > > > > >> > the > >> > > > > > > >> > >>> > >> limit? I > >> > > > > > > >> > >>> > >> > > > think > >> > > > > > > >> > >>> > >> > > > > > that > >> > > > > > > >> > >>> > >> > > > > > > > implies that compression always > >> makes > >> > > the > >> > > > > > > >> messages > >> > > > > > > >> > >>> > smaller, > >> > > > > > > >> > >>> > >> > > which i > >> > > > > > > >> > >>> > >> > > > > > think > >> > > > > > > >> > >>> > >> > > > > > > > usually true but is not > >> guaranteed, > >> > > > right? > >> > > > > > e.g. > >> > > > > > > >> if > >> > > > > > > >> > >>> someone > >> > > > > > > >> > >>> > >> > > encrypts > >> > > > > > > >> > >>> > >> > > > > > their > >> > > > > > > >> > >>> > >> > > > > > > > data which tends to randomize it > >> and > >> > > then > >> > > > > > > enables > >> > > > > > > >> > >>> > >> > compressesion, > >> > > > > > > >> > >>> > >> > > it > >> > > > > > > >> > >>> > >> > > > > > could > >> > > > > > > >> > >>> > >> > > > > > > > slightly get bigger? > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > I also wonder if the rejected > >> > > > alternatives > >> > > > > > you > >> > > > > > > >> > >>> describe > >> > > > > > > >> > >>> > >> > couldn't > >> > > > > > > >> > >>> > >> > > be > >> > > > > > > >> > >>> > >> > > > > > made > >> > > > > > > >> > >>> > >> > > > > > > to > >> > > > > > > >> > >>> > >> > > > > > > > work: basically try to be a bit > >> > better > >> > > at > >> > > > > > > >> estimation > >> > > > > > > >> > >>> and > >> > > > > > > >> > >>> > >> > recover > >> > > > > > > >> > >>> > >> > > > when > >> > > > > > > >> > >>> > >> > > > > > we > >> > > > > > > >> > >>> > >> > > > > > > > guess wrong. I don't think the > >> memory > >> > > > usage > >> > > > > > > >> should > >> > > > > > > >> > be > >> > > > > > > >> > >>> a > >> > > > > > > >> > >>> > >> > problem: > >> > > > > > > >> > >>> > >> > > > > isn't > >> > > > > > > >> > >>> > >> > > > > > it > >> > > > > > > >> > >>> > >> > > > > > > > the same memory usage the > >> consumer of > >> > > > that > >> > > > > > > topic > >> > > > > > > >> > would > >> > > > > > > >> > >>> > need? > >> > > > > > > >> > >>> > >> > And > >> > > > > > > >> > >>> > >> > > > > can't > >> > > > > > > >> > >>> > >> > > > > > > you > >> > > > > > > >> > >>> > >> > > > > > > > do the splitting and > recompression > >> > in a > >> > > > > > > streaming > >> > > > > > > >> > >>> fashion? > >> > > > > > > >> > >>> > >> If > >> > > > > > > >> > >>> > >> > we > >> > > > > > > >> > >>> > >> > > an > >> > > > > > > >> > >>> > >> > > > > > make > >> > > > > > > >> > >>> > >> > > > > > > > the estimation rate low and the > >> > > recovery > >> > > > > cost > >> > > > > > > is > >> > > > > > > >> > just > >> > > > > > > >> > >>> ~2x > >> > > > > > > >> > >>> > >> the > >> > > > > > > >> > >>> > >> > > > normal > >> > > > > > > >> > >>> > >> > > > > > cost > >> > > > > > > >> > >>> > >> > > > > > > > for that batch that should be > >> totally > >> > > > fine, > >> > > > > > > >> right? > >> > > > > > > >> > >>> (It's > >> > > > > > > >> > >>> > >> > > > technically > >> > > > > > > >> > >>> > >> > > > > > true > >> > > > > > > >> > >>> > >> > > > > > > > you might have to split more > than > >> > once, > >> > > > but > >> > > > > > > since > >> > > > > > > >> > you > >> > > > > > > >> > >>> > halve > >> > > > > > > >> > >>> > >> it > >> > > > > > > >> > >>> > >> > > each > >> > > > > > > >> > >>> > >> > > > > > time > >> > > > > > > >> > >>> > >> > > > > > > I > >> > > > > > > >> > >>> > >> > > > > > > > think should you get a number of > >> > > halvings > >> > > > > > that > >> > > > > > > is > >> > > > > > > >> > >>> > >> logarithmic > >> > > > > > > >> > >>> > >> > in > >> > > > > > > >> > >>> > >> > > > the > >> > > > > > > >> > >>> > >> > > > > > miss > >> > > > > > > >> > >>> > >> > > > > > > > size, which, with better > >> estimation > >> > > you'd > >> > > > > > hope > >> > > > > > > >> would > >> > > > > > > >> > >>> be > >> > > > > > > >> > >>> > >> super > >> > > > > > > >> > >>> > >> > > duper > >> > > > > > > >> > >>> > >> > > > > > > small). > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > Alternatively maybe we could > work > >> on > >> > > the > >> > > > > > other > >> > > > > > > >> side > >> > > > > > > >> > >>> of the > >> > > > > > > >> > >>> > >> > > problem > >> > > > > > > >> > >>> > >> > > > > and > >> > > > > > > >> > >>> > >> > > > > > > try > >> > > > > > > >> > >>> > >> > > > > > > > to make it so that a small miss > on > >> > > > message > >> > > > > > size > >> > > > > > > >> > isn't > >> > > > > > > >> > >>> a > >> > > > > > > >> > >>> > big > >> > > > > > > >> > >>> > >> > > > problem. > >> > > > > > > >> > >>> > >> > > > > I > >> > > > > > > >> > >>> > >> > > > > > > > think original issue was that > max > >> > size > >> > > > and > >> > > > > > > fetch > >> > > > > > > >> > size > >> > > > > > > >> > >>> were > >> > > > > > > >> > >>> > >> > > tightly > >> > > > > > > >> > >>> > >> > > > > > > coupled > >> > > > > > > >> > >>> > >> > > > > > > > and the way memory in the > consumer > >> > > worked > >> > > > > you > >> > > > > > > >> really > >> > > > > > > >> > >>> > wanted > >> > > > > > > >> > >>> > >> > fetch > >> > > > > > > >> > >>> > >> > > > > size > >> > > > > > > >> > >>> > >> > > > > > to > >> > > > > > > >> > >>> > >> > > > > > > > be as small as possible because > >> you'd > >> > > use > >> > > > > > that > >> > > > > > > >> much > >> > > > > > > >> > >>> memory > >> > > > > > > >> > >>> > >> per > >> > > > > > > >> > >>> > >> > > > > fetched > >> > > > > > > >> > >>> > >> > > > > > > > partition and the consumer would > >> get > >> > > > stuck > >> > > > > if > >> > > > > > > its > >> > > > > > > >> > >>> fetch > >> > > > > > > >> > >>> > size > >> > > > > > > >> > >>> > >> > > wasn't > >> > > > > > > >> > >>> > >> > > > > big > >> > > > > > > >> > >>> > >> > > > > > > > enough. I think we made some > >> progress > >> > > on > >> > > > > that > >> > > > > > > >> issue > >> > > > > > > >> > >>> and > >> > > > > > > >> > >>> > >> maybe > >> > > > > > > >> > >>> > >> > > more > >> > > > > > > >> > >>> > >> > > > > > could > >> > > > > > > >> > >>> > >> > > > > > > be > >> > > > > > > >> > >>> > >> > > > > > > > done there so that a small bit > of > >> > > > fuzziness > >> > > > > > > >> around > >> > > > > > > >> > the > >> > > > > > > >> > >>> > size > >> > > > > > > >> > >>> > >> > would > >> > > > > > > >> > >>> > >> > > > not > >> > > > > > > >> > >>> > >> > > > > > be > >> > > > > > > >> > >>> > >> > > > > > > an > >> > > > > > > >> > >>> > >> > > > > > > > issue? > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > -Jay > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > On Tue, Feb 21, 2017 at 12:30 > PM, > >> > > Becket > >> > > > > Qin > >> > > > > > < > >> > > > > > > >> > >>> > >> > > becket....@gmail.com > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > > > wrote: > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > Hi folks, > >> > > > > > > >> > >>> > >> > > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > I would like to start the > >> > discussion > >> > > > > thread > >> > > > > > > on > >> > > > > > > >> > >>> KIP-126. > >> > > > > > > >> > >>> > >> The > >> > > > > > > >> > >>> > >> > KIP > >> > > > > > > >> > >>> > >> > > > > > propose > >> > > > > > > >> > >>> > >> > > > > > > > > adding a new configuration to > >> > > > > KafkaProducer > >> > > > > > > to > >> > > > > > > >> > allow > >> > > > > > > >> > >>> > >> batching > >> > > > > > > >> > >>> > >> > > > based > >> > > > > > > >> > >>> > >> > > > > > on > >> > > > > > > >> > >>> > >> > > > > > > > > uncompressed message size. > >> > > > > > > >> > >>> > >> > > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > Comments are welcome. > >> > > > > > > >> > >>> > >> > > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > The KIP wiki is following: > >> > > > > > > >> > >>> > >> > > > > > > > > > https://cwiki.apache.org/confl > >> > > > > > > >> > >>> uence/display/KAFKA/KIP- > >> > > > > > > >> > >>> > >> > > > > > > > > 126+-+Allow+KafkaProducer+to+b > >> > > > > > > >> > >>> > >> atch+based+on+uncompressed+siz > >> > > > > > > >> > >>> > >> > e > >> > > > > > > >> > >>> > >> > > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > Thanks, > >> > > > > > > >> > >>> > >> > > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > Jiangjie (Becket) Qin > >> > > > > > > >> > >>> > >> > > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > > >> > > > > > > >> > >>> > >> > > > > > > >> > > > > > > >> > >>> > >> > > > > > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > -- > >> > > > > > > >> > >>> > >> > > > -Regards, > >> > > > > > > >> > >>> > >> > > > Mayuresh R. Gharat > >> > > > > > > >> > >>> > >> > > > (862) 250-7125 > >> > > > > > > >> > >>> > >> > > > > >> > > > > > > >> > >>> > >> > > > >> > > > > > > >> > >>> > >> > > >> > > > > > > >> > >>> > >> > > >> > > > > > > >> > >>> > >> > > >> > > > > > > >> > >>> > >> > -- > >> > > > > > > >> > >>> > >> > -Regards, > >> > > > > > > >> > >>> > >> > Mayuresh R. Gharat > >> > > > > > > >> > >>> > >> > (862) 250-7125 > >> > > > > > > >> > >>> > >> > > >> > > > > > > >> > >>> > >> > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > > > >> > > > > > > >> > >>> > > >> > > > > > > >> > >>> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > > > >> > > > > > > >> > > >> > > > > > > >> > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >