Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

Dong Lin Wed, 22 Mar 2017 11:18:33 -0700

Never mind about my second comment. I misunderstood the semantics of
producer's batch.size.


On Wed, Mar 22, 2017 at 10:20 AM, Dong Lin <lindon...@gmail.com> wrote:

> Hey Becket,
>
> In addition to the batch-split-rate, should we also add batch-split-ratio
> sensor to gauge the probability that we have to split batch?
>
> Also, in the case that the batch size configured for the producer is
> smaller than the max message size configured for the broker, why can't we
> just split the batch if its size exceeds the configured batch size? The
> benefit of this approach is that the semantics of producer is
> straightforward because we enforce the batch size that user has configured.
> The implementation would also be simpler because we don't have to reply on
> KIP-4 to fetch the max message size from broker. I guess you are worrying
> about the overhead of "unnecessary" split if a batch size is between
> user-configured batch size and broker's max message size. But is overhead
> really a concern? If overhead is too large because user has configured a
> very low batch size for producer, shouldn't user adjust produce config?
>
> Thanks,
> Dong
>
> On Wed, Mar 15, 2017 at 2:50 PM, Becket Qin <becket....@gmail.com> wrote:
>
>> I see, then we are thinking about the same thing :)
>>
>> On Wed, Mar 15, 2017 at 2:26 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>>
>> > I meant finishing what's described in the following section and then
>> > starting a discussion followed by a vote:
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 4+-+Command+line+and+centralized+administrative+operations#KIP-4-
>> > Commandlineandcentralizedadministrativeoperations-DescribeCo
>> nfigsRequest
>> >
>> > We have only voted on KIP-4 Metadata, KIP-4 Create Topics, KIP-4 Delete
>> > Topics so far.
>> >
>> > Ismael
>> >
>> > On Wed, Mar 15, 2017 at 8:58 PM, Becket Qin <becket....@gmail.com>
>> wrote:
>> >
>> > > Hi Ismael,
>> > >
>> > > KIP-4 is also the one that I was thinking about. We have introduced a
>> > > DescribeConfigRequest there so the producer can easily get the
>> > > configurations. By "another KIP" do you mean a new (or maybe extended)
>> > > protocol or using that protocol in clients?
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > > On Wed, Mar 15, 2017 at 1:21 PM, Ismael Juma <ism...@juma.me.uk>
>> wrote:
>> > >
>> > > > Hi Becket,
>> > > >
>> > > > How were you thinking of retrieving the configuration items you
>> > > mentioned?
>> > > > I am asking because I was planning to post a KIP for Describe
>> Configs
>> > > (one
>> > > > of the protocols in KIP-4), which would expose such information. But
>> > > maybe
>> > > > you are thinking of extending Metadata request?
>> > > >
>> > > > Ismael
>> > > >
>> > > > On Wed, Mar 15, 2017 at 7:33 PM, Becket Qin <becket....@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi Jason,
>> > > > >
>> > > > > Good point. I was thinking about that, too. I was not sure if
>> that is
>> > > the
>> > > > > right thing to do by default.
>> > > > >
>> > > > > If we assume people always set the batch size to max message size,
>> > > > > splitting the oversized batch makes a lot of sense. But it seems
>> > > possible
>> > > > > that users want to control the memory footprint so they would set
>> the
>> > > > batch
>> > > > > size to smaller than the max message size so the producer can have
>> > hold
>> > > > > batches for more partitions. In this case, splitting the batch
>> might
>> > > not
>> > > > be
>> > > > > the desired behavior.
>> > > > >
>> > > > > I think the most intuitive approach to this is allow the producer
>> to
>> > > get
>> > > > > the max message size configuration (as well as some other
>> > > configurations
>> > > > > such as timestamp type)  from the broker side and use that to
>> decide
>> > > > > whether a batch should be split or not. I probably should add
>> this to
>> > > the
>> > > > > KIP wiki.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > > On Wed, Mar 15, 2017 at 9:47 AM, Jason Gustafson <
>> ja...@confluent.io
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hey Becket,
>> > > > > >
>> > > > > > Thanks for the KIP! The approach seems reasonable. One
>> > clarification:
>> > > > is
>> > > > > > the intent to do the splitting after the broker rejects the
>> request
>> > > > with
>> > > > > > MESSAGE_TOO_LARGE, or prior to sending if the configured batch
>> size
>> > > is
>> > > > > > exceeded?
>> > > > > >
>> > > > > > -Jason
>> > > > > >
>> > > > > > On Mon, Mar 13, 2017 at 8:10 PM, Becket Qin <
>> becket....@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Bump up the thread for further comments. If there is no more
>> > > comments
>> > > > > on
>> > > > > > > the KIP I will start the voting thread on Wed.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Jiangjie (Becket) Qin
>> > > > > > >
>> > > > > > > On Tue, Mar 7, 2017 at 9:48 AM, Becket Qin <
>> becket....@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Dong,
>> > > > > > > >
>> > > > > > > > Thanks for the comments.
>> > > > > > > >
>> > > > > > > > The patch is mostly for proof of concept in case there is
>> any
>> > > > concern
>> > > > > > > > about the implementation which is indeed a little tricky.
>> > > > > > > >
>> > > > > > > > The new metric has already been mentioned in the Public
>> > Interface
>> > > > > > Change
>> > > > > > > > section.
>> > > > > > > >
>> > > > > > > > I added the reasoning about how the compression ratio
>> > > > > > > > improving/deteriorate steps are determined in the wiki.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Mon, Mar 6, 2017 at 4:42 PM, Dong Lin <
>> lindon...@gmail.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > >> Hey Becket,
>> > > > > > > >>
>> > > > > > > >> I am wondering if we should first vote for the KIP before
>> > > > reviewing
>> > > > > > the
>> > > > > > > >> patch. I have two comments below:
>> > > > > > > >>
>> > > > > > > >> - Should we specify the new sensors as part of interface
>> > change
>> > > in
>> > > > > the
>> > > > > > > >> KIP?
>> > > > > > > >> - The KIP proposes to increase estimated compression ratio
>> by
>> > > 0.05
>> > > > > for
>> > > > > > > >> each
>> > > > > > > >> underestimation and decrement the estimation by 0.005 for
>> each
>> > > > > > > >> overestimation. Why are these two values chosen? I think
>> there
>> > > is
>> > > > > some
>> > > > > > > >> tradeoff in selecting the value. Can the KIP be more
>> explicit
>> > > > about
>> > > > > > the
>> > > > > > > >> tradeoff and explain how these two values would impact
>> > > producer's
>> > > > > > > >> performance?
>> > > > > > > >>
>> > > > > > > >> Thanks,
>> > > > > > > >> Dong
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >> On Sat, Mar 4, 2017 at 11:42 AM, Becket Qin <
>> > > becket....@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > >>
>> > > > > > > >> > I have updated the KIP based on the latest discussion.
>> > Please
>> > > > > check
>> > > > > > > and
>> > > > > > > >> let
>> > > > > > > >> > me know if there is any further concern.
>> > > > > > > >> >
>> > > > > > > >> > Thanks,
>> > > > > > > >> >
>> > > > > > > >> > Jiangjie (Becket) Qin
>> > > > > > > >> >
>> > > > > > > >> > On Sat, Mar 4, 2017 at 10:56 AM, Becket Qin <
>> > > > becket....@gmail.com
>> > > > > >
>> > > > > > > >> wrote:
>> > > > > > > >> >
>> > > > > > > >> > > Actually second thought on this, rate might be better
>> for
>> > > two
>> > > > > > > reasons:
>> > > > > > > >> > > 1. Most of the metrics in the producer we already have
>> are
>> > > > using
>> > > > > > > rate
>> > > > > > > >> > > instead of count.
>> > > > > > > >> > > 2. If a service is bounced, the count will be reset to
>> 0,
>> > > but
>> > > > it
>> > > > > > > does
>> > > > > > > >> not
>> > > > > > > >> > > affect rate.
>> > > > > > > >> > >
>> > > > > > > >> > > I'll make the change.
>> > > > > > > >> > >
>> > > > > > > >> > > Thanks,
>> > > > > > > >> > >
>> > > > > > > >> > > Jiangjie (Becket) Qin
>> > > > > > > >> > >
>> > > > > > > >> > > On Sat, Mar 4, 2017 at 10:27 AM, Becket Qin <
>> > > > > becket....@gmail.com
>> > > > > > >
>> > > > > > > >> > wrote:
>> > > > > > > >> > >
>> > > > > > > >> > >> Hi Dong,
>> > > > > > > >> > >>
>> > > > > > > >> > >> Yes, there is a sensor in the patch about the split
>> > > > occurrence.
>> > > > > > > >> > >>
>> > > > > > > >> > >> Currently it is a count instead of rate. In practice,
>> it
>> > > > seems
>> > > > > > > count
>> > > > > > > >> is
>> > > > > > > >> > >> easier to use in this case. But I am open to change.
>> > > > > > > >> > >>
>> > > > > > > >> > >> Thanks,
>> > > > > > > >> > >>
>> > > > > > > >> > >> Jiangjie (Becket) Qin
>> > > > > > > >> > >>
>> > > > > > > >> > >> On Fri, Mar 3, 2017 at 7:43 PM, Dong Lin <
>> > > > lindon...@gmail.com>
>> > > > > > > >> wrote:
>> > > > > > > >> > >>
>> > > > > > > >> > >>> Hey Becket,
>> > > > > > > >> > >>>
>> > > > > > > >> > >>> I haven't looked at the patch yet. But since we are
>> > going
>> > > to
>> > > > > try
>> > > > > > > the
>> > > > > > > >> > >>> split-on-oversize solution, should the KIP also add a
>> > > sensor
>> > > > > > that
>> > > > > > > >> shows
>> > > > > > > >> > >>> the
>> > > > > > > >> > >>> rate of split per second and the probability of
>> split?
>> > > > > > > >> > >>>
>> > > > > > > >> > >>> Thanks,
>> > > > > > > >> > >>> Dong
>> > > > > > > >> > >>>
>> > > > > > > >> > >>>
>> > > > > > > >> > >>> On Fri, Mar 3, 2017 at 6:39 PM, Becket Qin <
>> > > > > > becket....@gmail.com>
>> > > > > > > >> > wrote:
>> > > > > > > >> > >>>
>> > > > > > > >> > >>> > Just to clarify, the implementation is basically
>> what
>> > I
>> > > > > > > mentioned
>> > > > > > > >> > above
>> > > > > > > >> > >>> > (split/resend + adjusted estimation evolving
>> > algorithm)
>> > > > and
>> > > > > > > >> changing
>> > > > > > > >> > >>> the
>> > > > > > > >> > >>> > compression ratio estimation to be per topic.
>> > > > > > > >> > >>> >
>> > > > > > > >> > >>> > Thanks,
>> > > > > > > >> > >>> >
>> > > > > > > >> > >>> > Jiangjie (Becket) Qin
>> > > > > > > >> > >>> >
>> > > > > > > >> > >>> > On Fri, Mar 3, 2017 at 6:36 PM, Becket Qin <
>> > > > > > > becket....@gmail.com>
>> > > > > > > >> > >>> wrote:
>> > > > > > > >> > >>> >
>> > > > > > > >> > >>> > > I went ahead and have a patch submitted here:
>> > > > > > > >> > >>> > > https://github.com/apache/kafka/pull/2638
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> > > Per Joel's suggestion, I changed the compression
>> > ratio
>> > > > to
>> > > > > be
>> > > > > > > per
>> > > > > > > >> > >>> topic as
>> > > > > > > >> > >>> > > well. It seems working well. Since there is an
>> > > important
>> > > > > > > >> behavior
>> > > > > > > >> > >>> change
>> > > > > > > >> > >>> > > and a new sensor is added, I'll keep the KIP and
>> > > update
>> > > > it
>> > > > > > > >> > according.
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> > > Thanks,
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> > > Jiangjie (Becket) Qin
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> > > On Mon, Feb 27, 2017 at 3:50 PM, Joel Koshy <
>> > > > > > > >> jjkosh...@gmail.com>
>> > > > > > > >> > >>> wrote:
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> > >> >
>> > > > > > > >> > >>> > >> > Lets say we sent the batch over the wire and
>> > > > received a
>> > > > > > > >> > >>> > >> > RecordTooLargeException, how do we split it as
>> > once
>> > > > we
>> > > > > > add
>> > > > > > > >> the
>> > > > > > > >> > >>> message
>> > > > > > > >> > >>> > >> to
>> > > > > > > >> > >>> > >> > the batch we loose the message level
>> granularity.
>> > > We
>> > > > > will
>> > > > > > > >> have
>> > > > > > > >> > to
>> > > > > > > >> > >>> > >> > decompress, do deep iteration and split and
>> again
>> > > > > > compress.
>> > > > > > > >> > right?
>> > > > > > > >> > >>> > This
>> > > > > > > >> > >>> > >> > looks like a performance bottle neck in case
>> of
>> > > multi
>> > > > > > topic
>> > > > > > > >> > >>> producers
>> > > > > > > >> > >>> > >> like
>> > > > > > > >> > >>> > >> > mirror maker.
>> > > > > > > >> > >>> > >> >
>> > > > > > > >> > >>> > >>
>> > > > > > > >> > >>> > >> Yes, but these should be outliers if we do
>> > estimation
>> > > > on
>> > > > > a
>> > > > > > > >> > per-topic
>> > > > > > > >> > >>> > basis
>> > > > > > > >> > >>> > >> and if we target a conservative-enough
>> compression
>> > > > ratio.
>> > > > > > The
>> > > > > > > >> > >>> producer
>> > > > > > > >> > >>> > >> should also avoid sending over the wire if it
>> can
>> > be
>> > > > made
>> > > > > > > >> aware of
>> > > > > > > >> > >>> the
>> > > > > > > >> > >>> > >> max-message size limit on the broker, and split
>> if
>> > it
>> > > > > > > >> determines
>> > > > > > > >> > >>> that a
>> > > > > > > >> > >>> > >> record exceeds the broker's config. Ideally this
>> > > should
>> > > > > be
>> > > > > > > >> part of
>> > > > > > > >> > >>> topic
>> > > > > > > >> > >>> > >> metadata but is not - so it could be off a
>> periodic
>> > > > > > > >> > describe-configs
>> > > > > > > >> > >>> > >> <https://cwiki.apache.org/
>> > > > confluence/display/KAFKA/KIP-
>> > > > > > 4+-+
>> > > > > > > >> > >>> > >> Command+line+and+centralized+
>> > > > > > administrative+operations#KIP-
>> > > > > > > >> > >>> > >> 4-Commandlineandcentralizedadmin
>> > > > > > istrativeoperations-Describe
>> > > > > > > >> > >>> > >> ConfigsRequest>
>> > > > > > > >> > >>> > >> (which isn't available yet). This doesn't remove
>> > the
>> > > > need
>> > > > > > to
>> > > > > > > >> split
>> > > > > > > >> > >>> and
>> > > > > > > >> > >>> > >> recompress though.
>> > > > > > > >> > >>> > >>
>> > > > > > > >> > >>> > >>
>> > > > > > > >> > >>> > >> > On Mon, Feb 27, 2017 at 10:51 AM, Becket Qin <
>> > > > > > > >> > >>> becket....@gmail.com>
>> > > > > > > >> > >>> > >> wrote:
>> > > > > > > >> > >>> > >> >
>> > > > > > > >> > >>> > >> > > Hey Mayuresh,
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > > 1) The batch would be split when an
>> > > > > > > >> RecordTooLargeException is
>> > > > > > > >> > >>> > >> received.
>> > > > > > > >> > >>> > >> > > 2) Not lower the actual compression ratio,
>> but
>> > > > lower
>> > > > > > the
>> > > > > > > >> > >>> estimated
>> > > > > > > >> > >>> > >> > > compression ratio "according to" the Actual
>> > > > > Compression
>> > > > > > > >> > >>> Ratio(ACR).
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > > An example, let's start with Estimated
>> > > Compression
>> > > > > > Ratio
>> > > > > > > >> > (ECR) =
>> > > > > > > >> > >>> > 1.0.
>> > > > > > > >> > >>> > >> Say
>> > > > > > > >> > >>> > >> > > the compression ratio of ACR is ~0.8,
>> instead
>> > of
>> > > > > > letting
>> > > > > > > >> the
>> > > > > > > >> > ECR
>> > > > > > > >> > >>> > >> dropped
>> > > > > > > >> > >>> > >> > to
>> > > > > > > >> > >>> > >> > > 0.8 very quickly, we only drop 0.001 every
>> time
>> > > > when
>> > > > > > ACR
>> > > > > > > <
>> > > > > > > >> > ECR.
>> > > > > > > >> > >>> > >> However,
>> > > > > > > >> > >>> > >> > > once we see an ACR > ECR, we increment ECR
>> by
>> > > 0.05.
>> > > > > If
>> > > > > > a
>> > > > > > > >> > >>> > >> > > RecordTooLargeException is received, we
>> reset
>> > the
>> > > > ECR
>> > > > > > > back
>> > > > > > > >> to
>> > > > > > > >> > >>> 1.0
>> > > > > > > >> > >>> > and
>> > > > > > > >> > >>> > >> > split
>> > > > > > > >> > >>> > >> > > the batch.
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > > Thanks,
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > > Jiangjie (Becket) Qin
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > > On Mon, Feb 27, 2017 at 10:30 AM, Mayuresh
>> > > Gharat <
>> > > > > > > >> > >>> > >> > > gharatmayures...@gmail.com> wrote:
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > > > Hi Becket,
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > > Seems like an interesting idea.
>> > > > > > > >> > >>> > >> > > > I had couple of questions :
>> > > > > > > >> > >>> > >> > > > 1) How do we decide when the batch should
>> be
>> > > > split?
>> > > > > > > >> > >>> > >> > > > 2) What do you mean by slowly lowering the
>> > > > "actual"
>> > > > > > > >> > >>> compression
>> > > > > > > >> > >>> > >> ratio?
>> > > > > > > >> > >>> > >> > > > An example would really help here.
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > > Thanks,
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > > Mayuresh
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > > On Fri, Feb 24, 2017 at 3:17 PM, Becket
>> Qin <
>> > > > > > > >> > >>> becket....@gmail.com
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> > >> > > wrote:
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > > > Hi Jay,
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > Yeah, I got your point.
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > I think there might be a solution which
>> do
>> > > not
>> > > > > > > require
>> > > > > > > >> > >>> adding a
>> > > > > > > >> > >>> > >> new
>> > > > > > > >> > >>> > >> > > > > configuration. We can start from a very
>> > > > > > conservative
>> > > > > > > >> > >>> compression
>> > > > > > > >> > >>> > >> > ratio
>> > > > > > > >> > >>> > >> > > > say
>> > > > > > > >> > >>> > >> > > > > 1.0 and lower it very slowly according
>> to
>> > the
>> > > > > > actual
>> > > > > > > >> > >>> compression
>> > > > > > > >> > >>> > >> > ratio
>> > > > > > > >> > >>> > >> > > > > until we hit a point that we have to
>> split
>> > a
>> > > > > batch.
>> > > > > > > At
>> > > > > > > >> > that
>> > > > > > > >> > >>> > >> point, we
>> > > > > > > >> > >>> > >> > > > > exponentially back off on the
>> compression
>> > > > ratio.
>> > > > > > The
>> > > > > > > >> idea
>> > > > > > > >> > is
>> > > > > > > >> > >>> > >> somewhat
>> > > > > > > >> > >>> > >> > > > like
>> > > > > > > >> > >>> > >> > > > > TCP. This should help avoid frequent
>> split.
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > The upper bound of the batch size is
>> also a
>> > > > > little
>> > > > > > > >> awkward
>> > > > > > > >> > >>> today
>> > > > > > > >> > >>> > >> > > because
>> > > > > > > >> > >>> > >> > > > we
>> > > > > > > >> > >>> > >> > > > > say the batch size is based on
>> compressed
>> > > size,
>> > > > > but
>> > > > > > > >> users
>> > > > > > > >> > >>> cannot
>> > > > > > > >> > >>> > >> set
>> > > > > > > >> > >>> > >> > it
>> > > > > > > >> > >>> > >> > > > to
>> > > > > > > >> > >>> > >> > > > > the max message size because that will
>> > result
>> > > > in
>> > > > > > > >> oversized
>> > > > > > > >> > >>> > >> messages.
>> > > > > > > >> > >>> > >> > > With
>> > > > > > > >> > >>> > >> > > > > this change we will be able to allow the
>> > > users
>> > > > to
>> > > > > > set
>> > > > > > > >> the
>> > > > > > > >> > >>> > message
>> > > > > > > >> > >>> > >> > size
>> > > > > > > >> > >>> > >> > > to
>> > > > > > > >> > >>> > >> > > > > close to max message size.
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > However the downside is that there
>> could be
>> > > > > latency
>> > > > > > > >> spikes
>> > > > > > > >> > >>> in
>> > > > > > > >> > >>> > the
>> > > > > > > >> > >>> > >> > > system
>> > > > > > > >> > >>> > >> > > > in
>> > > > > > > >> > >>> > >> > > > > this case due to the splitting,
>> especially
>> > > when
>> > > > > > there
>> > > > > > > >> are
>> > > > > > > >> > >>> many
>> > > > > > > >> > >>> > >> > messages
>> > > > > > > >> > >>> > >> > > > > need to be split at the same time. That
>> > could
>> > > > > > > >> potentially
>> > > > > > > >> > >>> be an
>> > > > > > > >> > >>> > >> issue
>> > > > > > > >> > >>> > >> > > for
>> > > > > > > >> > >>> > >> > > > > some users.
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > What do you think about this approach?
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > Thanks,
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > Jiangjie (Becket) Qin
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > On Thu, Feb 23, 2017 at 1:31 PM, Jay
>> Kreps
>> > <
>> > > > > > > >> > >>> j...@confluent.io>
>> > > > > > > >> > >>> > >> wrote:
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > > Hey Becket,
>> > > > > > > >> > >>> > >> > > > > >
>> > > > > > > >> > >>> > >> > > > > > Yeah that makes sense.
>> > > > > > > >> > >>> > >> > > > > >
>> > > > > > > >> > >>> > >> > > > > > I agree that you'd really have to both
>> > fix
>> > > > the
>> > > > > > > >> > estimation
>> > > > > > > >> > >>> > (i.e.
>> > > > > > > >> > >>> > >> > make
>> > > > > > > >> > >>> > >> > > it
>> > > > > > > >> > >>> > >> > > > > per
>> > > > > > > >> > >>> > >> > > > > > topic or make it better estimate the
>> high
>> > > > > > > >> percentiles)
>> > > > > > > >> > AND
>> > > > > > > >> > >>> > have
>> > > > > > > >> > >>> > >> the
>> > > > > > > >> > >>> > >> > > > > > recovery mechanism. If you are
>> > > > underestimating
>> > > > > > > often
>> > > > > > > >> and
>> > > > > > > >> > >>> then
>> > > > > > > >> > >>> > >> > paying
>> > > > > > > >> > >>> > >> > > a
>> > > > > > > >> > >>> > >> > > > > high
>> > > > > > > >> > >>> > >> > > > > > recovery price that won't fly.
>> > > > > > > >> > >>> > >> > > > > >
>> > > > > > > >> > >>> > >> > > > > > I think you take my main point though,
>> > > which
>> > > > is
>> > > > > > > just
>> > > > > > > >> > that
>> > > > > > > >> > >>> I
>> > > > > > > >> > >>> > >> hate to
>> > > > > > > >> > >>> > >> > > > > exposes
>> > > > > > > >> > >>> > >> > > > > > these super low level options to users
>> > > > because
>> > > > > it
>> > > > > > > is
>> > > > > > > >> so
>> > > > > > > >> > >>> hard
>> > > > > > > >> > >>> > to
>> > > > > > > >> > >>> > >> > > explain
>> > > > > > > >> > >>> > >> > > > > to
>> > > > > > > >> > >>> > >> > > > > > people what it means and how they
>> should
>> > > set
>> > > > > it.
>> > > > > > So
>> > > > > > > >> if
>> > > > > > > >> > it
>> > > > > > > >> > >>> is
>> > > > > > > >> > >>> > >> > possible
>> > > > > > > >> > >>> > >> > > > to
>> > > > > > > >> > >>> > >> > > > > > make either some combination of better
>> > > > > estimation
>> > > > > > > and
>> > > > > > > >> > >>> > splitting
>> > > > > > > >> > >>> > >> or
>> > > > > > > >> > >>> > >> > > > better
>> > > > > > > >> > >>> > >> > > > > > tolerance of overage that would be
>> > > > preferrable.
>> > > > > > > >> > >>> > >> > > > > >
>> > > > > > > >> > >>> > >> > > > > > -Jay
>> > > > > > > >> > >>> > >> > > > > >
>> > > > > > > >> > >>> > >> > > > > > On Thu, Feb 23, 2017 at 11:51 AM,
>> Becket
>> > > Qin
>> > > > <
>> > > > > > > >> > >>> > >> becket....@gmail.com
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> > > > > wrote:
>> > > > > > > >> > >>> > >> > > > > >
>> > > > > > > >> > >>> > >> > > > > > > @Dong,
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > Thanks for the comments. The default
>> > > > behavior
>> > > > > > of
>> > > > > > > >> the
>> > > > > > > >> > >>> > producer
>> > > > > > > >> > >>> > >> > won't
>> > > > > > > >> > >>> > >> > > > > > change.
>> > > > > > > >> > >>> > >> > > > > > > If the users want to use the
>> > uncompressed
>> > > > > > message
>> > > > > > > >> > size,
>> > > > > > > >> > >>> they
>> > > > > > > >> > >>> > >> > > probably
>> > > > > > > >> > >>> > >> > > > > > will
>> > > > > > > >> > >>> > >> > > > > > > also bump up the batch size to
>> > somewhere
>> > > > > close
>> > > > > > to
>> > > > > > > >> the
>> > > > > > > >> > >>> max
>> > > > > > > >> > >>> > >> message
>> > > > > > > >> > >>> > >> > > > size.
>> > > > > > > >> > >>> > >> > > > > > > This would be in the document. BTW
>> the
>> > > > > default
>> > > > > > > >> batch
>> > > > > > > >> > >>> size is
>> > > > > > > >> > >>> > >> 16K
>> > > > > > > >> > >>> > >> > > > which
>> > > > > > > >> > >>> > >> > > > > is
>> > > > > > > >> > >>> > >> > > > > > > pretty small.
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > @Jay,
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > Yeah, we actually had debated quite
>> a
>> > bit
>> > > > > > > >> internally
>> > > > > > > >> > >>> what is
>> > > > > > > >> > >>> > >> the
>> > > > > > > >> > >>> > >> > > best
>> > > > > > > >> > >>> > >> > > > > > > solution to this.
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > I completely agree it is a bug. In
>> > > practice
>> > > > > we
>> > > > > > > >> usually
>> > > > > > > >> > >>> leave
>> > > > > > > >> > >>> > >> some
>> > > > > > > >> > >>> > >> > > > > > headroom
>> > > > > > > >> > >>> > >> > > > > > > to allow the compressed size to
>> grow a
>> > > > little
>> > > > > > if
>> > > > > > > >> the
>> > > > > > > >> > the
>> > > > > > > >> > >>> > >> original
>> > > > > > > >> > >>> > >> > > > > > messages
>> > > > > > > >> > >>> > >> > > > > > > are not compressible, for example,
>> 1000
>> > > KB
>> > > > > > > instead
>> > > > > > > >> of
>> > > > > > > >> > >>> > exactly
>> > > > > > > >> > >>> > >> 1
>> > > > > > > >> > >>> > >> > MB.
>> > > > > > > >> > >>> > >> > > > It
>> > > > > > > >> > >>> > >> > > > > is
>> > > > > > > >> > >>> > >> > > > > > > likely safe enough.
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > The major concern for the rejected
>> > > > > alternative
>> > > > > > is
>> > > > > > > >> > >>> > >> performance. It
>> > > > > > > >> > >>> > >> > > > > largely
>> > > > > > > >> > >>> > >> > > > > > > depends on how frequent we need to
>> > split
>> > > a
>> > > > > > batch,
>> > > > > > > >> i.e.
>> > > > > > > >> > >>> how
>> > > > > > > >> > >>> > >> likely
>> > > > > > > >> > >>> > >> > > the
>> > > > > > > >> > >>> > >> > > > > > > estimation can go off. If we only
>> need
>> > to
>> > > > the
>> > > > > > > split
>> > > > > > > >> > work
>> > > > > > > >> > >>> > >> > > > occasionally,
>> > > > > > > >> > >>> > >> > > > > > the
>> > > > > > > >> > >>> > >> > > > > > > cost would be amortized so we don't
>> > need
>> > > to
>> > > > > > worry
>> > > > > > > >> > about
>> > > > > > > >> > >>> it
>> > > > > > > >> > >>> > too
>> > > > > > > >> > >>> > >> > > much.
>> > > > > > > >> > >>> > >> > > > > > > However, it looks that for a
>> producer
>> > > with
>> > > > > > shared
>> > > > > > > >> > >>> topics,
>> > > > > > > >> > >>> > the
>> > > > > > > >> > >>> > >> > > > > estimation
>> > > > > > > >> > >>> > >> > > > > > is
>> > > > > > > >> > >>> > >> > > > > > > always off. As an example, consider
>> two
>> > > > > topics,
>> > > > > > > one
>> > > > > > > >> > with
>> > > > > > > >> > >>> > >> > > compression
>> > > > > > > >> > >>> > >> > > > > > ratio
>> > > > > > > >> > >>> > >> > > > > > > 0.6 the other 0.2, assuming exactly
>> > same
>> > > > > > traffic,
>> > > > > > > >> the
>> > > > > > > >> > >>> > average
>> > > > > > > >> > >>> > >> > > > > compression
>> > > > > > > >> > >>> > >> > > > > > > ratio would be roughly 0.4, which is
>> > not
>> > > > > right
>> > > > > > > for
>> > > > > > > >> > >>> either of
>> > > > > > > >> > >>> > >> the
>> > > > > > > >> > >>> > >> > > > > topics.
>> > > > > > > >> > >>> > >> > > > > > So
>> > > > > > > >> > >>> > >> > > > > > > almost half of the batches (of the
>> > topics
>> > > > > with
>> > > > > > > 0.6
>> > > > > > > >> > >>> > compression
>> > > > > > > >> > >>> > >> > > ratio)
>> > > > > > > >> > >>> > >> > > > > > will
>> > > > > > > >> > >>> > >> > > > > > > end up larger than the configured
>> batch
>> > > > size.
>> > > > > > > When
>> > > > > > > >> it
>> > > > > > > >> > >>> comes
>> > > > > > > >> > >>> > to
>> > > > > > > >> > >>> > >> > more
>> > > > > > > >> > >>> > >> > > > > > topics
>> > > > > > > >> > >>> > >> > > > > > > such as mirror maker, this becomes
>> more
>> > > > > > > >> unpredictable.
>> > > > > > > >> > >>> To
>> > > > > > > >> > >>> > >> avoid
>> > > > > > > >> > >>> > >> > > > > frequent
>> > > > > > > >> > >>> > >> > > > > > > rejection / split of the batches, we
>> > need
>> > > > to
>> > > > > > > >> > configured
>> > > > > > > >> > >>> the
>> > > > > > > >> > >>> > >> batch
>> > > > > > > >> > >>> > >> > > > size
>> > > > > > > >> > >>> > >> > > > > > > pretty conservatively. This could
>> > > actually
>> > > > > hurt
>> > > > > > > the
>> > > > > > > >> > >>> > >> performance
>> > > > > > > >> > >>> > >> > > > because
>> > > > > > > >> > >>> > >> > > > > > we
>> > > > > > > >> > >>> > >> > > > > > > are shoehorn the messages that are
>> > highly
>> > > > > > > >> compressible
>> > > > > > > >> > >>> to a
>> > > > > > > >> > >>> > >> small
>> > > > > > > >> > >>> > >> > > > batch
>> > > > > > > >> > >>> > >> > > > > > so
>> > > > > > > >> > >>> > >> > > > > > > that the other topics that are not
>> that
>> > > > > > > >> compressible
>> > > > > > > >> > >>> will
>> > > > > > > >> > >>> > not
>> > > > > > > >> > >>> > >> > > become
>> > > > > > > >> > >>> > >> > > > > too
>> > > > > > > >> > >>> > >> > > > > > > large with the same batch size. At
>> > > > LinkedIn,
>> > > > > > our
>> > > > > > > >> batch
>> > > > > > > >> > >>> size
>> > > > > > > >> > >>> > is
>> > > > > > > >> > >>> > >> > > > > configured
>> > > > > > > >> > >>> > >> > > > > > > to 64 KB because of this. I think we
>> > may
>> > > > > > actually
>> > > > > > > >> have
>> > > > > > > >> > >>> > better
>> > > > > > > >> > >>> > >> > > > batching
>> > > > > > > >> > >>> > >> > > > > if
>> > > > > > > >> > >>> > >> > > > > > > we just use the uncompressed message
>> > size
>> > > > and
>> > > > > > 800
>> > > > > > > >> KB
>> > > > > > > >> > >>> batch
>> > > > > > > >> > >>> > >> size.
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > We did not think about loosening the
>> > > > message
>> > > > > > size
>> > > > > > > >> > >>> > restriction,
>> > > > > > > >> > >>> > >> > but
>> > > > > > > >> > >>> > >> > > > that
>> > > > > > > >> > >>> > >> > > > > > > sounds a viable solution given that
>> the
>> > > > > > consumer
>> > > > > > > >> now
>> > > > > > > >> > can
>> > > > > > > >> > >>> > fetch
>> > > > > > > >> > >>> > >> > > > > oversized
>> > > > > > > >> > >>> > >> > > > > > > messages. One concern would be that
>> on
>> > > the
>> > > > > > broker
>> > > > > > > >> side
>> > > > > > > >> > >>> > >> oversized
>> > > > > > > >> > >>> > >> > > > > messages
>> > > > > > > >> > >>> > >> > > > > > > will bring more memory pressure.
>> With
>> > > > KIP-92,
>> > > > > > we
>> > > > > > > >> may
>> > > > > > > >> > >>> > mitigate
>> > > > > > > >> > >>> > >> > that,
>> > > > > > > >> > >>> > >> > > > but
>> > > > > > > >> > >>> > >> > > > > > the
>> > > > > > > >> > >>> > >> > > > > > > memory allocation for large messages
>> > may
>> > > > not
>> > > > > be
>> > > > > > > >> very
>> > > > > > > >> > GC
>> > > > > > > >> > >>> > >> > friendly. I
>> > > > > > > >> > >>> > >> > > > > need
>> > > > > > > >> > >>> > >> > > > > > to
>> > > > > > > >> > >>> > >> > > > > > > think about this a little more.
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > Thanks,
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > Jiangjie (Becket) Qin
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > On Wed, Feb 22, 2017 at 8:57 PM, Jay
>> > > Kreps
>> > > > <
>> > > > > > > >> > >>> > j...@confluent.io>
>> > > > > > > >> > >>> > >> > > wrote:
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > Hey Becket,
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > I get the problem we want to solve
>> > with
>> > > > > this,
>> > > > > > > >> but I
>> > > > > > > >> > >>> don't
>> > > > > > > >> > >>> > >> think
>> > > > > > > >> > >>> > >> > > > this
>> > > > > > > >> > >>> > >> > > > > is
>> > > > > > > >> > >>> > >> > > > > > > > something that makes sense as a
>> user
>> > > > > > controlled
>> > > > > > > >> knob
>> > > > > > > >> > >>> that
>> > > > > > > >> > >>> > >> > > everyone
>> > > > > > > >> > >>> > >> > > > > > > sending
>> > > > > > > >> > >>> > >> > > > > > > > data to kafka has to think about.
>> It
>> > is
>> > > > > > > >> basically a
>> > > > > > > >> > >>> bug,
>> > > > > > > >> > >>> > >> right?
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > First, as a technical question is
>> it
>> > > true
>> > > > > > that
>> > > > > > > >> using
>> > > > > > > >> > >>> the
>> > > > > > > >> > >>> > >> > > > uncompressed
>> > > > > > > >> > >>> > >> > > > > > > size
>> > > > > > > >> > >>> > >> > > > > > > > for batching actually guarantees
>> that
>> > > you
>> > > > > > > observe
>> > > > > > > >> > the
>> > > > > > > >> > >>> > >> limit? I
>> > > > > > > >> > >>> > >> > > > think
>> > > > > > > >> > >>> > >> > > > > > that
>> > > > > > > >> > >>> > >> > > > > > > > implies that compression always
>> makes
>> > > the
>> > > > > > > >> messages
>> > > > > > > >> > >>> > smaller,
>> > > > > > > >> > >>> > >> > > which i
>> > > > > > > >> > >>> > >> > > > > > think
>> > > > > > > >> > >>> > >> > > > > > > > usually true but is not
>> guaranteed,
>> > > > right?
>> > > > > > e.g.
>> > > > > > > >> if
>> > > > > > > >> > >>> someone
>> > > > > > > >> > >>> > >> > > encrypts
>> > > > > > > >> > >>> > >> > > > > > their
>> > > > > > > >> > >>> > >> > > > > > > > data which tends to randomize it
>> and
>> > > then
>> > > > > > > enables
>> > > > > > > >> > >>> > >> > compressesion,
>> > > > > > > >> > >>> > >> > > it
>> > > > > > > >> > >>> > >> > > > > > could
>> > > > > > > >> > >>> > >> > > > > > > > slightly get bigger?
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > I also wonder if the rejected
>> > > > alternatives
>> > > > > > you
>> > > > > > > >> > >>> describe
>> > > > > > > >> > >>> > >> > couldn't
>> > > > > > > >> > >>> > >> > > be
>> > > > > > > >> > >>> > >> > > > > > made
>> > > > > > > >> > >>> > >> > > > > > > to
>> > > > > > > >> > >>> > >> > > > > > > > work: basically try to be a bit
>> > better
>> > > at
>> > > > > > > >> estimation
>> > > > > > > >> > >>> and
>> > > > > > > >> > >>> > >> > recover
>> > > > > > > >> > >>> > >> > > > when
>> > > > > > > >> > >>> > >> > > > > > we
>> > > > > > > >> > >>> > >> > > > > > > > guess wrong. I don't think the
>> memory
>> > > > usage
>> > > > > > > >> should
>> > > > > > > >> > be
>> > > > > > > >> > >>> a
>> > > > > > > >> > >>> > >> > problem:
>> > > > > > > >> > >>> > >> > > > > isn't
>> > > > > > > >> > >>> > >> > > > > > it
>> > > > > > > >> > >>> > >> > > > > > > > the same memory usage the
>> consumer of
>> > > > that
>> > > > > > > topic
>> > > > > > > >> > would
>> > > > > > > >> > >>> > need?
>> > > > > > > >> > >>> > >> > And
>> > > > > > > >> > >>> > >> > > > > can't
>> > > > > > > >> > >>> > >> > > > > > > you
>> > > > > > > >> > >>> > >> > > > > > > > do the splitting and recompression
>> > in a
>> > > > > > > streaming
>> > > > > > > >> > >>> fashion?
>> > > > > > > >> > >>> > >> If
>> > > > > > > >> > >>> > >> > we
>> > > > > > > >> > >>> > >> > > an
>> > > > > > > >> > >>> > >> > > > > > make
>> > > > > > > >> > >>> > >> > > > > > > > the estimation rate low and the
>> > > recovery
>> > > > > cost
>> > > > > > > is
>> > > > > > > >> > just
>> > > > > > > >> > >>> ~2x
>> > > > > > > >> > >>> > >> the
>> > > > > > > >> > >>> > >> > > > normal
>> > > > > > > >> > >>> > >> > > > > > cost
>> > > > > > > >> > >>> > >> > > > > > > > for that batch that should be
>> totally
>> > > > fine,
>> > > > > > > >> right?
>> > > > > > > >> > >>> (It's
>> > > > > > > >> > >>> > >> > > > technically
>> > > > > > > >> > >>> > >> > > > > > true
>> > > > > > > >> > >>> > >> > > > > > > > you might have to split more than
>> > once,
>> > > > but
>> > > > > > > since
>> > > > > > > >> > you
>> > > > > > > >> > >>> > halve
>> > > > > > > >> > >>> > >> it
>> > > > > > > >> > >>> > >> > > each
>> > > > > > > >> > >>> > >> > > > > > time
>> > > > > > > >> > >>> > >> > > > > > > I
>> > > > > > > >> > >>> > >> > > > > > > > think should you get a number of
>> > > halvings
>> > > > > > that
>> > > > > > > is
>> > > > > > > >> > >>> > >> logarithmic
>> > > > > > > >> > >>> > >> > in
>> > > > > > > >> > >>> > >> > > > the
>> > > > > > > >> > >>> > >> > > > > > miss
>> > > > > > > >> > >>> > >> > > > > > > > size, which, with better
>> estimation
>> > > you'd
>> > > > > > hope
>> > > > > > > >> would
>> > > > > > > >> > >>> be
>> > > > > > > >> > >>> > >> super
>> > > > > > > >> > >>> > >> > > duper
>> > > > > > > >> > >>> > >> > > > > > > small).
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > Alternatively maybe we could work
>> on
>> > > the
>> > > > > > other
>> > > > > > > >> side
>> > > > > > > >> > >>> of the
>> > > > > > > >> > >>> > >> > > problem
>> > > > > > > >> > >>> > >> > > > > and
>> > > > > > > >> > >>> > >> > > > > > > try
>> > > > > > > >> > >>> > >> > > > > > > > to make it so that a small miss on
>> > > > message
>> > > > > > size
>> > > > > > > >> > isn't
>> > > > > > > >> > >>> a
>> > > > > > > >> > >>> > big
>> > > > > > > >> > >>> > >> > > > problem.
>> > > > > > > >> > >>> > >> > > > > I
>> > > > > > > >> > >>> > >> > > > > > > > think original issue was that max
>> > size
>> > > > and
>> > > > > > > fetch
>> > > > > > > >> > size
>> > > > > > > >> > >>> were
>> > > > > > > >> > >>> > >> > > tightly
>> > > > > > > >> > >>> > >> > > > > > > coupled
>> > > > > > > >> > >>> > >> > > > > > > > and the way memory in the consumer
>> > > worked
>> > > > > you
>> > > > > > > >> really
>> > > > > > > >> > >>> > wanted
>> > > > > > > >> > >>> > >> > fetch
>> > > > > > > >> > >>> > >> > > > > size
>> > > > > > > >> > >>> > >> > > > > > to
>> > > > > > > >> > >>> > >> > > > > > > > be as small as possible because
>> you'd
>> > > use
>> > > > > > that
>> > > > > > > >> much
>> > > > > > > >> > >>> memory
>> > > > > > > >> > >>> > >> per
>> > > > > > > >> > >>> > >> > > > > fetched
>> > > > > > > >> > >>> > >> > > > > > > > partition and the consumer would
>> get
>> > > > stuck
>> > > > > if
>> > > > > > > its
>> > > > > > > >> > >>> fetch
>> > > > > > > >> > >>> > size
>> > > > > > > >> > >>> > >> > > wasn't
>> > > > > > > >> > >>> > >> > > > > big
>> > > > > > > >> > >>> > >> > > > > > > > enough. I think we made some
>> progress
>> > > on
>> > > > > that
>> > > > > > > >> issue
>> > > > > > > >> > >>> and
>> > > > > > > >> > >>> > >> maybe
>> > > > > > > >> > >>> > >> > > more
>> > > > > > > >> > >>> > >> > > > > > could
>> > > > > > > >> > >>> > >> > > > > > > be
>> > > > > > > >> > >>> > >> > > > > > > > done there so that a small bit of
>> > > > fuzziness
>> > > > > > > >> around
>> > > > > > > >> > the
>> > > > > > > >> > >>> > size
>> > > > > > > >> > >>> > >> > would
>> > > > > > > >> > >>> > >> > > > not
>> > > > > > > >> > >>> > >> > > > > > be
>> > > > > > > >> > >>> > >> > > > > > > an
>> > > > > > > >> > >>> > >> > > > > > > > issue?
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > -Jay
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > On Tue, Feb 21, 2017 at 12:30 PM,
>> > > Becket
>> > > > > Qin
>> > > > > > <
>> > > > > > > >> > >>> > >> > > becket....@gmail.com
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > > > > > wrote:
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > > Hi folks,
>> > > > > > > >> > >>> > >> > > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > > I would like to start the
>> > discussion
>> > > > > thread
>> > > > > > > on
>> > > > > > > >> > >>> KIP-126.
>> > > > > > > >> > >>> > >> The
>> > > > > > > >> > >>> > >> > KIP
>> > > > > > > >> > >>> > >> > > > > > propose
>> > > > > > > >> > >>> > >> > > > > > > > > adding a new configuration to
>> > > > > KafkaProducer
>> > > > > > > to
>> > > > > > > >> > allow
>> > > > > > > >> > >>> > >> batching
>> > > > > > > >> > >>> > >> > > > based
>> > > > > > > >> > >>> > >> > > > > > on
>> > > > > > > >> > >>> > >> > > > > > > > > uncompressed message size.
>> > > > > > > >> > >>> > >> > > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > > Comments are welcome.
>> > > > > > > >> > >>> > >> > > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > > The KIP wiki is following:
>> > > > > > > >> > >>> > >> > > > > > > > > https://cwiki.apache.org/confl
>> > > > > > > >> > >>> uence/display/KAFKA/KIP-
>> > > > > > > >> > >>> > >> > > > > > > > > 126+-+Allow+KafkaProducer+to+b
>> > > > > > > >> > >>> > >> atch+based+on+uncompressed+siz
>> > > > > > > >> > >>> > >> > e
>> > > > > > > >> > >>> > >> > > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > > Thanks,
>> > > > > > > >> > >>> > >> > > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > >> > >>> > >> > > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > > >
>> > > > > > > >> > >>> > >> > > > > > >
>> > > > > > > >> > >>> > >> > > > > >
>> > > > > > > >> > >>> > >> > > > >
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > > > --
>> > > > > > > >> > >>> > >> > > > -Regards,
>> > > > > > > >> > >>> > >> > > > Mayuresh R. Gharat
>> > > > > > > >> > >>> > >> > > > (862) 250-7125
>> > > > > > > >> > >>> > >> > > >
>> > > > > > > >> > >>> > >> > >
>> > > > > > > >> > >>> > >> >
>> > > > > > > >> > >>> > >> >
>> > > > > > > >> > >>> > >> >
>> > > > > > > >> > >>> > >> > --
>> > > > > > > >> > >>> > >> > -Regards,
>> > > > > > > >> > >>> > >> > Mayuresh R. Gharat
>> > > > > > > >> > >>> > >> > (862) 250-7125
>> > > > > > > >> > >>> > >> >
>> > > > > > > >> > >>> > >>
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> > >
>> > > > > > > >> > >>> >
>> > > > > > > >> > >>>
>> > > > > > > >> > >>
>> > > > > > > >> > >>
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

Reply via email to