Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

Ismael Juma Wed, 15 Mar 2017 14:27:45 -0700

I meant finishing what's described in the following section and then
starting a discussion followed by a vote:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-DescribeConfigsRequest

We have only voted on KIP-4 Metadata, KIP-4 Create Topics, KIP-4 Delete
Topics so far.

Ismael

On Wed, Mar 15, 2017 at 8:58 PM, Becket Qin <becket....@gmail.com> wrote:

> Hi Ismael,
>
> KIP-4 is also the one that I was thinking about. We have introduced a
> DescribeConfigRequest there so the producer can easily get the
> configurations. By "another KIP" do you mean a new (or maybe extended)
> protocol or using that protocol in clients?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Mar 15, 2017 at 1:21 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi Becket,
> >
> > How were you thinking of retrieving the configuration items you
> mentioned?
> > I am asking because I was planning to post a KIP for Describe Configs
> (one
> > of the protocols in KIP-4), which would expose such information. But
> maybe
> > you are thinking of extending Metadata request?
> >
> > Ismael
> >
> > On Wed, Mar 15, 2017 at 7:33 PM, Becket Qin <becket....@gmail.com>
> wrote:
> >
> > > Hi Jason,
> > >
> > > Good point. I was thinking about that, too. I was not sure if that is
> the
> > > right thing to do by default.
> > >
> > > If we assume people always set the batch size to max message size,
> > > splitting the oversized batch makes a lot of sense. But it seems
> possible
> > > that users want to control the memory footprint so they would set the
> > batch
> > > size to smaller than the max message size so the producer can have hold
> > > batches for more partitions. In this case, splitting the batch might
> not
> > be
> > > the desired behavior.
> > >
> > > I think the most intuitive approach to this is allow the producer to
> get
> > > the max message size configuration (as well as some other
> configurations
> > > such as timestamp type)  from the broker side and use that to decide
> > > whether a batch should be split or not. I probably should add this to
> the
> > > KIP wiki.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Mar 15, 2017 at 9:47 AM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Becket,
> > > >
> > > > Thanks for the KIP! The approach seems reasonable. One clarification:
> > is
> > > > the intent to do the splitting after the broker rejects the request
> > with
> > > > MESSAGE_TOO_LARGE, or prior to sending if the configured batch size
> is
> > > > exceeded?
> > > >
> > > > -Jason
> > > >
> > > > On Mon, Mar 13, 2017 at 8:10 PM, Becket Qin <becket....@gmail.com>
> > > wrote:
> > > >
> > > > > Bump up the thread for further comments. If there is no more
> comments
> > > on
> > > > > the KIP I will start the voting thread on Wed.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Mar 7, 2017 at 9:48 AM, Becket Qin <becket....@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Dong,
> > > > > >
> > > > > > Thanks for the comments.
> > > > > >
> > > > > > The patch is mostly for proof of concept in case there is any
> > concern
> > > > > > about the implementation which is indeed a little tricky.
> > > > > >
> > > > > > The new metric has already been mentioned in the Public Interface
> > > > Change
> > > > > > section.
> > > > > >
> > > > > > I added the reasoning about how the compression ratio
> > > > > > improving/deteriorate steps are determined in the wiki.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 6, 2017 at 4:42 PM, Dong Lin <lindon...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hey Becket,
> > > > > >>
> > > > > >> I am wondering if we should first vote for the KIP before
> > reviewing
> > > > the
> > > > > >> patch. I have two comments below:
> > > > > >>
> > > > > >> - Should we specify the new sensors as part of interface change
> in
> > > the
> > > > > >> KIP?
> > > > > >> - The KIP proposes to increase estimated compression ratio by
> 0.05
> > > for
> > > > > >> each
> > > > > >> underestimation and decrement the estimation by 0.005 for each
> > > > > >> overestimation. Why are these two values chosen? I think there
> is
> > > some
> > > > > >> tradeoff in selecting the value. Can the KIP be more explicit
> > about
> > > > the
> > > > > >> tradeoff and explain how these two values would impact
> producer's
> > > > > >> performance?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Dong
> > > > > >>
> > > > > >>
> > > > > >> On Sat, Mar 4, 2017 at 11:42 AM, Becket Qin <
> becket....@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > I have updated the KIP based on the latest discussion. Please
> > > check
> > > > > and
> > > > > >> let
> > > > > >> > me know if there is any further concern.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Jiangjie (Becket) Qin
> > > > > >> >
> > > > > >> > On Sat, Mar 4, 2017 at 10:56 AM, Becket Qin <
> > becket....@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Actually second thought on this, rate might be better for
> two
> > > > > reasons:
> > > > > >> > > 1. Most of the metrics in the producer we already have are
> > using
> > > > > rate
> > > > > >> > > instead of count.
> > > > > >> > > 2. If a service is bounced, the count will be reset to 0,
> but
> > it
> > > > > does
> > > > > >> not
> > > > > >> > > affect rate.
> > > > > >> > >
> > > > > >> > > I'll make the change.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jiangjie (Becket) Qin
> > > > > >> > >
> > > > > >> > > On Sat, Mar 4, 2017 at 10:27 AM, Becket Qin <
> > > becket....@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > >> Hi Dong,
> > > > > >> > >>
> > > > > >> > >> Yes, there is a sensor in the patch about the split
> > occurrence.
> > > > > >> > >>
> > > > > >> > >> Currently it is a count instead of rate. In practice, it
> > seems
> > > > > count
> > > > > >> is
> > > > > >> > >> easier to use in this case. But I am open to change.
> > > > > >> > >>
> > > > > >> > >> Thanks,
> > > > > >> > >>
> > > > > >> > >> Jiangjie (Becket) Qin
> > > > > >> > >>
> > > > > >> > >> On Fri, Mar 3, 2017 at 7:43 PM, Dong Lin <
> > lindon...@gmail.com>
> > > > > >> wrote:
> > > > > >> > >>
> > > > > >> > >>> Hey Becket,
> > > > > >> > >>>
> > > > > >> > >>> I haven't looked at the patch yet. But since we are going
> to
> > > try
> > > > > the
> > > > > >> > >>> split-on-oversize solution, should the KIP also add a
> sensor
> > > > that
> > > > > >> shows
> > > > > >> > >>> the
> > > > > >> > >>> rate of split per second and the probability of split?
> > > > > >> > >>>
> > > > > >> > >>> Thanks,
> > > > > >> > >>> Dong
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>> On Fri, Mar 3, 2017 at 6:39 PM, Becket Qin <
> > > > becket....@gmail.com>
> > > > > >> > wrote:
> > > > > >> > >>>
> > > > > >> > >>> > Just to clarify, the implementation is basically what I
> > > > > mentioned
> > > > > >> > above
> > > > > >> > >>> > (split/resend + adjusted estimation evolving algorithm)
> > and
> > > > > >> changing
> > > > > >> > >>> the
> > > > > >> > >>> > compression ratio estimation to be per topic.
> > > > > >> > >>> >
> > > > > >> > >>> > Thanks,
> > > > > >> > >>> >
> > > > > >> > >>> > Jiangjie (Becket) Qin
> > > > > >> > >>> >
> > > > > >> > >>> > On Fri, Mar 3, 2017 at 6:36 PM, Becket Qin <
> > > > > becket....@gmail.com>
> > > > > >> > >>> wrote:
> > > > > >> > >>> >
> > > > > >> > >>> > > I went ahead and have a patch submitted here:
> > > > > >> > >>> > > https://github.com/apache/kafka/pull/2638
> > > > > >> > >>> > >
> > > > > >> > >>> > > Per Joel's suggestion, I changed the compression ratio
> > to
> > > be
> > > > > per
> > > > > >> > >>> topic as
> > > > > >> > >>> > > well. It seems working well. Since there is an
> important
> > > > > >> behavior
> > > > > >> > >>> change
> > > > > >> > >>> > > and a new sensor is added, I'll keep the KIP and
> update
> > it
> > > > > >> > according.
> > > > > >> > >>> > >
> > > > > >> > >>> > > Thanks,
> > > > > >> > >>> > >
> > > > > >> > >>> > > Jiangjie (Becket) Qin
> > > > > >> > >>> > >
> > > > > >> > >>> > > On Mon, Feb 27, 2017 at 3:50 PM, Joel Koshy <
> > > > > >> jjkosh...@gmail.com>
> > > > > >> > >>> wrote:
> > > > > >> > >>> > >
> > > > > >> > >>> > >> >
> > > > > >> > >>> > >> > Lets say we sent the batch over the wire and
> > received a
> > > > > >> > >>> > >> > RecordTooLargeException, how do we split it as once
> > we
> > > > add
> > > > > >> the
> > > > > >> > >>> message
> > > > > >> > >>> > >> to
> > > > > >> > >>> > >> > the batch we loose the message level granularity.
> We
> > > will
> > > > > >> have
> > > > > >> > to
> > > > > >> > >>> > >> > decompress, do deep iteration and split and again
> > > > compress.
> > > > > >> > right?
> > > > > >> > >>> > This
> > > > > >> > >>> > >> > looks like a performance bottle neck in case of
> multi
> > > > topic
> > > > > >> > >>> producers
> > > > > >> > >>> > >> like
> > > > > >> > >>> > >> > mirror maker.
> > > > > >> > >>> > >> >
> > > > > >> > >>> > >>
> > > > > >> > >>> > >> Yes, but these should be outliers if we do estimation
> > on
> > > a
> > > > > >> > per-topic
> > > > > >> > >>> > basis
> > > > > >> > >>> > >> and if we target a conservative-enough compression
> > ratio.
> > > > The
> > > > > >> > >>> producer
> > > > > >> > >>> > >> should also avoid sending over the wire if it can be
> > made
> > > > > >> aware of
> > > > > >> > >>> the
> > > > > >> > >>> > >> max-message size limit on the broker, and split if it
> > > > > >> determines
> > > > > >> > >>> that a
> > > > > >> > >>> > >> record exceeds the broker's config. Ideally this
> should
> > > be
> > > > > >> part of
> > > > > >> > >>> topic
> > > > > >> > >>> > >> metadata but is not - so it could be off a periodic
> > > > > >> > describe-configs
> > > > > >> > >>> > >> <https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-
> > > > 4+-+
> > > > > >> > >>> > >> Command+line+and+centralized+
> > > > administrative+operations#KIP-
> > > > > >> > >>> > >> 4-Commandlineandcentralizedadmin
> > > > istrativeoperations-Describe
> > > > > >> > >>> > >> ConfigsRequest>
> > > > > >> > >>> > >> (which isn't available yet). This doesn't remove the
> > need
> > > > to
> > > > > >> split
> > > > > >> > >>> and
> > > > > >> > >>> > >> recompress though.
> > > > > >> > >>> > >>
> > > > > >> > >>> > >>
> > > > > >> > >>> > >> > On Mon, Feb 27, 2017 at 10:51 AM, Becket Qin <
> > > > > >> > >>> becket....@gmail.com>
> > > > > >> > >>> > >> wrote:
> > > > > >> > >>> > >> >
> > > > > >> > >>> > >> > > Hey Mayuresh,
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > > 1) The batch would be split when an
> > > > > >> RecordTooLargeException is
> > > > > >> > >>> > >> received.
> > > > > >> > >>> > >> > > 2) Not lower the actual compression ratio, but
> > lower
> > > > the
> > > > > >> > >>> estimated
> > > > > >> > >>> > >> > > compression ratio "according to" the Actual
> > > Compression
> > > > > >> > >>> Ratio(ACR).
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > > An example, let's start with Estimated
> Compression
> > > > Ratio
> > > > > >> > (ECR) =
> > > > > >> > >>> > 1.0.
> > > > > >> > >>> > >> Say
> > > > > >> > >>> > >> > > the compression ratio of ACR is ~0.8, instead of
> > > > letting
> > > > > >> the
> > > > > >> > ECR
> > > > > >> > >>> > >> dropped
> > > > > >> > >>> > >> > to
> > > > > >> > >>> > >> > > 0.8 very quickly, we only drop 0.001 every time
> > when
> > > > ACR
> > > > > <
> > > > > >> > ECR.
> > > > > >> > >>> > >> However,
> > > > > >> > >>> > >> > > once we see an ACR > ECR, we increment ECR by
> 0.05.
> > > If
> > > > a
> > > > > >> > >>> > >> > > RecordTooLargeException is received, we reset the
> > ECR
> > > > > back
> > > > > >> to
> > > > > >> > >>> 1.0
> > > > > >> > >>> > and
> > > > > >> > >>> > >> > split
> > > > > >> > >>> > >> > > the batch.
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > > Thanks,
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > > Jiangjie (Becket) Qin
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > > On Mon, Feb 27, 2017 at 10:30 AM, Mayuresh
> Gharat <
> > > > > >> > >>> > >> > > gharatmayures...@gmail.com> wrote:
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > > > Hi Becket,
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > > Seems like an interesting idea.
> > > > > >> > >>> > >> > > > I had couple of questions :
> > > > > >> > >>> > >> > > > 1) How do we decide when the batch should be
> > split?
> > > > > >> > >>> > >> > > > 2) What do you mean by slowly lowering the
> > "actual"
> > > > > >> > >>> compression
> > > > > >> > >>> > >> ratio?
> > > > > >> > >>> > >> > > > An example would really help here.
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > > Thanks,
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > > Mayuresh
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > > On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin <
> > > > > >> > >>> becket....@gmail.com
> > > > > >> > >>> > >
> > > > > >> > >>> > >> > > wrote:
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > > > Hi Jay,
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > Yeah, I got your point.
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > I think there might be a solution which do
> not
> > > > > require
> > > > > >> > >>> adding a
> > > > > >> > >>> > >> new
> > > > > >> > >>> > >> > > > > configuration. We can start from a very
> > > > conservative
> > > > > >> > >>> compression
> > > > > >> > >>> > >> > ratio
> > > > > >> > >>> > >> > > > say
> > > > > >> > >>> > >> > > > > 1.0 and lower it very slowly according to the
> > > > actual
> > > > > >> > >>> compression
> > > > > >> > >>> > >> > ratio
> > > > > >> > >>> > >> > > > > until we hit a point that we have to split a
> > > batch.
> > > > > At
> > > > > >> > that
> > > > > >> > >>> > >> point, we
> > > > > >> > >>> > >> > > > > exponentially back off on the compression
> > ratio.
> > > > The
> > > > > >> idea
> > > > > >> > is
> > > > > >> > >>> > >> somewhat
> > > > > >> > >>> > >> > > > like
> > > > > >> > >>> > >> > > > > TCP. This should help avoid frequent split.
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > The upper bound of the batch size is also a
> > > little
> > > > > >> awkward
> > > > > >> > >>> today
> > > > > >> > >>> > >> > > because
> > > > > >> > >>> > >> > > > we
> > > > > >> > >>> > >> > > > > say the batch size is based on compressed
> size,
> > > but
> > > > > >> users
> > > > > >> > >>> cannot
> > > > > >> > >>> > >> set
> > > > > >> > >>> > >> > it
> > > > > >> > >>> > >> > > > to
> > > > > >> > >>> > >> > > > > the max message size because that will result
> > in
> > > > > >> oversized
> > > > > >> > >>> > >> messages.
> > > > > >> > >>> > >> > > With
> > > > > >> > >>> > >> > > > > this change we will be able to allow the
> users
> > to
> > > > set
> > > > > >> the
> > > > > >> > >>> > message
> > > > > >> > >>> > >> > size
> > > > > >> > >>> > >> > > to
> > > > > >> > >>> > >> > > > > close to max message size.
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > However the downside is that there could be
> > > latency
> > > > > >> spikes
> > > > > >> > >>> in
> > > > > >> > >>> > the
> > > > > >> > >>> > >> > > system
> > > > > >> > >>> > >> > > > in
> > > > > >> > >>> > >> > > > > this case due to the splitting, especially
> when
> > > > there
> > > > > >> are
> > > > > >> > >>> many
> > > > > >> > >>> > >> > messages
> > > > > >> > >>> > >> > > > > need to be split at the same time. That could
> > > > > >> potentially
> > > > > >> > >>> be an
> > > > > >> > >>> > >> issue
> > > > > >> > >>> > >> > > for
> > > > > >> > >>> > >> > > > > some users.
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > What do you think about this approach?
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > Thanks,
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > Jiangjie (Becket) Qin
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > On Thu, Feb 23, 2017 at 1:31 PM, Jay Kreps <
> > > > > >> > >>> j...@confluent.io>
> > > > > >> > >>> > >> wrote:
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > > Hey Becket,
> > > > > >> > >>> > >> > > > > >
> > > > > >> > >>> > >> > > > > > Yeah that makes sense.
> > > > > >> > >>> > >> > > > > >
> > > > > >> > >>> > >> > > > > > I agree that you'd really have to both fix
> > the
> > > > > >> > estimation
> > > > > >> > >>> > (i.e.
> > > > > >> > >>> > >> > make
> > > > > >> > >>> > >> > > it
> > > > > >> > >>> > >> > > > > per
> > > > > >> > >>> > >> > > > > > topic or make it better estimate the high
> > > > > >> percentiles)
> > > > > >> > AND
> > > > > >> > >>> > have
> > > > > >> > >>> > >> the
> > > > > >> > >>> > >> > > > > > recovery mechanism. If you are
> > underestimating
> > > > > often
> > > > > >> and
> > > > > >> > >>> then
> > > > > >> > >>> > >> > paying
> > > > > >> > >>> > >> > > a
> > > > > >> > >>> > >> > > > > high
> > > > > >> > >>> > >> > > > > > recovery price that won't fly.
> > > > > >> > >>> > >> > > > > >
> > > > > >> > >>> > >> > > > > > I think you take my main point though,
> which
> > is
> > > > > just
> > > > > >> > that
> > > > > >> > >>> I
> > > > > >> > >>> > >> hate to
> > > > > >> > >>> > >> > > > > exposes
> > > > > >> > >>> > >> > > > > > these super low level options to users
> > because
> > > it
> > > > > is
> > > > > >> so
> > > > > >> > >>> hard
> > > > > >> > >>> > to
> > > > > >> > >>> > >> > > explain
> > > > > >> > >>> > >> > > > > to
> > > > > >> > >>> > >> > > > > > people what it means and how they should
> set
> > > it.
> > > > So
> > > > > >> if
> > > > > >> > it
> > > > > >> > >>> is
> > > > > >> > >>> > >> > possible
> > > > > >> > >>> > >> > > > to
> > > > > >> > >>> > >> > > > > > make either some combination of better
> > > estimation
> > > > > and
> > > > > >> > >>> > splitting
> > > > > >> > >>> > >> or
> > > > > >> > >>> > >> > > > better
> > > > > >> > >>> > >> > > > > > tolerance of overage that would be
> > preferrable.
> > > > > >> > >>> > >> > > > > >
> > > > > >> > >>> > >> > > > > > -Jay
> > > > > >> > >>> > >> > > > > >
> > > > > >> > >>> > >> > > > > > On Thu, Feb 23, 2017 at 11:51 AM, Becket
> Qin
> > <
> > > > > >> > >>> > >> becket....@gmail.com
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> > > > > wrote:
> > > > > >> > >>> > >> > > > > >
> > > > > >> > >>> > >> > > > > > > @Dong,
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > Thanks for the comments. The default
> > behavior
> > > > of
> > > > > >> the
> > > > > >> > >>> > producer
> > > > > >> > >>> > >> > won't
> > > > > >> > >>> > >> > > > > > change.
> > > > > >> > >>> > >> > > > > > > If the users want to use the uncompressed
> > > > message
> > > > > >> > size,
> > > > > >> > >>> they
> > > > > >> > >>> > >> > > probably
> > > > > >> > >>> > >> > > > > > will
> > > > > >> > >>> > >> > > > > > > also bump up the batch size to somewhere
> > > close
> > > > to
> > > > > >> the
> > > > > >> > >>> max
> > > > > >> > >>> > >> message
> > > > > >> > >>> > >> > > > size.
> > > > > >> > >>> > >> > > > > > > This would be in the document. BTW the
> > > default
> > > > > >> batch
> > > > > >> > >>> size is
> > > > > >> > >>> > >> 16K
> > > > > >> > >>> > >> > > > which
> > > > > >> > >>> > >> > > > > is
> > > > > >> > >>> > >> > > > > > > pretty small.
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > @Jay,
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > Yeah, we actually had debated quite a bit
> > > > > >> internally
> > > > > >> > >>> what is
> > > > > >> > >>> > >> the
> > > > > >> > >>> > >> > > best
> > > > > >> > >>> > >> > > > > > > solution to this.
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > I completely agree it is a bug. In
> practice
> > > we
> > > > > >> usually
> > > > > >> > >>> leave
> > > > > >> > >>> > >> some
> > > > > >> > >>> > >> > > > > > headroom
> > > > > >> > >>> > >> > > > > > > to allow the compressed size to grow a
> > little
> > > > if
> > > > > >> the
> > > > > >> > the
> > > > > >> > >>> > >> original
> > > > > >> > >>> > >> > > > > > messages
> > > > > >> > >>> > >> > > > > > > are not compressible, for example, 1000
> KB
> > > > > instead
> > > > > >> of
> > > > > >> > >>> > exactly
> > > > > >> > >>> > >> 1
> > > > > >> > >>> > >> > MB.
> > > > > >> > >>> > >> > > > It
> > > > > >> > >>> > >> > > > > is
> > > > > >> > >>> > >> > > > > > > likely safe enough.
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > The major concern for the rejected
> > > alternative
> > > > is
> > > > > >> > >>> > >> performance. It
> > > > > >> > >>> > >> > > > > largely
> > > > > >> > >>> > >> > > > > > > depends on how frequent we need to split
> a
> > > > batch,
> > > > > >> i.e.
> > > > > >> > >>> how
> > > > > >> > >>> > >> likely
> > > > > >> > >>> > >> > > the
> > > > > >> > >>> > >> > > > > > > estimation can go off. If we only need to
> > the
> > > > > split
> > > > > >> > work
> > > > > >> > >>> > >> > > > occasionally,
> > > > > >> > >>> > >> > > > > > the
> > > > > >> > >>> > >> > > > > > > cost would be amortized so we don't need
> to
> > > > worry
> > > > > >> > about
> > > > > >> > >>> it
> > > > > >> > >>> > too
> > > > > >> > >>> > >> > > much.
> > > > > >> > >>> > >> > > > > > > However, it looks that for a producer
> with
> > > > shared
> > > > > >> > >>> topics,
> > > > > >> > >>> > the
> > > > > >> > >>> > >> > > > > estimation
> > > > > >> > >>> > >> > > > > > is
> > > > > >> > >>> > >> > > > > > > always off. As an example, consider two
> > > topics,
> > > > > one
> > > > > >> > with
> > > > > >> > >>> > >> > > compression
> > > > > >> > >>> > >> > > > > > ratio
> > > > > >> > >>> > >> > > > > > > 0.6 the other 0.2, assuming exactly same
> > > > traffic,
> > > > > >> the
> > > > > >> > >>> > average
> > > > > >> > >>> > >> > > > > compression
> > > > > >> > >>> > >> > > > > > > ratio would be roughly 0.4, which is not
> > > right
> > > > > for
> > > > > >> > >>> either of
> > > > > >> > >>> > >> the
> > > > > >> > >>> > >> > > > > topics.
> > > > > >> > >>> > >> > > > > > So
> > > > > >> > >>> > >> > > > > > > almost half of the batches (of the topics
> > > with
> > > > > 0.6
> > > > > >> > >>> > compression
> > > > > >> > >>> > >> > > ratio)
> > > > > >> > >>> > >> > > > > > will
> > > > > >> > >>> > >> > > > > > > end up larger than the configured batch
> > size.
> > > > > When
> > > > > >> it
> > > > > >> > >>> comes
> > > > > >> > >>> > to
> > > > > >> > >>> > >> > more
> > > > > >> > >>> > >> > > > > > topics
> > > > > >> > >>> > >> > > > > > > such as mirror maker, this becomes more
> > > > > >> unpredictable.
> > > > > >> > >>> To
> > > > > >> > >>> > >> avoid
> > > > > >> > >>> > >> > > > > frequent
> > > > > >> > >>> > >> > > > > > > rejection / split of the batches, we need
> > to
> > > > > >> > configured
> > > > > >> > >>> the
> > > > > >> > >>> > >> batch
> > > > > >> > >>> > >> > > > size
> > > > > >> > >>> > >> > > > > > > pretty conservatively. This could
> actually
> > > hurt
> > > > > the
> > > > > >> > >>> > >> performance
> > > > > >> > >>> > >> > > > because
> > > > > >> > >>> > >> > > > > > we
> > > > > >> > >>> > >> > > > > > > are shoehorn the messages that are highly
> > > > > >> compressible
> > > > > >> > >>> to a
> > > > > >> > >>> > >> small
> > > > > >> > >>> > >> > > > batch
> > > > > >> > >>> > >> > > > > > so
> > > > > >> > >>> > >> > > > > > > that the other topics that are not that
> > > > > >> compressible
> > > > > >> > >>> will
> > > > > >> > >>> > not
> > > > > >> > >>> > >> > > become
> > > > > >> > >>> > >> > > > > too
> > > > > >> > >>> > >> > > > > > > large with the same batch size. At
> > LinkedIn,
> > > > our
> > > > > >> batch
> > > > > >> > >>> size
> > > > > >> > >>> > is
> > > > > >> > >>> > >> > > > > configured
> > > > > >> > >>> > >> > > > > > > to 64 KB because of this. I think we may
> > > > actually
> > > > > >> have
> > > > > >> > >>> > better
> > > > > >> > >>> > >> > > > batching
> > > > > >> > >>> > >> > > > > if
> > > > > >> > >>> > >> > > > > > > we just use the uncompressed message size
> > and
> > > > 800
> > > > > >> KB
> > > > > >> > >>> batch
> > > > > >> > >>> > >> size.
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > We did not think about loosening the
> > message
> > > > size
> > > > > >> > >>> > restriction,
> > > > > >> > >>> > >> > but
> > > > > >> > >>> > >> > > > that
> > > > > >> > >>> > >> > > > > > > sounds a viable solution given that the
> > > > consumer
> > > > > >> now
> > > > > >> > can
> > > > > >> > >>> > fetch
> > > > > >> > >>> > >> > > > > oversized
> > > > > >> > >>> > >> > > > > > > messages. One concern would be that on
> the
> > > > broker
> > > > > >> side
> > > > > >> > >>> > >> oversized
> > > > > >> > >>> > >> > > > > messages
> > > > > >> > >>> > >> > > > > > > will bring more memory pressure. With
> > KIP-92,
> > > > we
> > > > > >> may
> > > > > >> > >>> > mitigate
> > > > > >> > >>> > >> > that,
> > > > > >> > >>> > >> > > > but
> > > > > >> > >>> > >> > > > > > the
> > > > > >> > >>> > >> > > > > > > memory allocation for large messages may
> > not
> > > be
> > > > > >> very
> > > > > >> > GC
> > > > > >> > >>> > >> > friendly. I
> > > > > >> > >>> > >> > > > > need
> > > > > >> > >>> > >> > > > > > to
> > > > > >> > >>> > >> > > > > > > think about this a little more.
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > Thanks,
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > Jiangjie (Becket) Qin
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > On Wed, Feb 22, 2017 at 8:57 PM, Jay
> Kreps
> > <
> > > > > >> > >>> > j...@confluent.io>
> > > > > >> > >>> > >> > > wrote:
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > > > > Hey Becket,
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > > I get the problem we want to solve with
> > > this,
> > > > > >> but I
> > > > > >> > >>> don't
> > > > > >> > >>> > >> think
> > > > > >> > >>> > >> > > > this
> > > > > >> > >>> > >> > > > > is
> > > > > >> > >>> > >> > > > > > > > something that makes sense as a user
> > > > controlled
> > > > > >> knob
> > > > > >> > >>> that
> > > > > >> > >>> > >> > > everyone
> > > > > >> > >>> > >> > > > > > > sending
> > > > > >> > >>> > >> > > > > > > > data to kafka has to think about. It is
> > > > > >> basically a
> > > > > >> > >>> bug,
> > > > > >> > >>> > >> right?
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > > First, as a technical question is it
> true
> > > > that
> > > > > >> using
> > > > > >> > >>> the
> > > > > >> > >>> > >> > > > uncompressed
> > > > > >> > >>> > >> > > > > > > size
> > > > > >> > >>> > >> > > > > > > > for batching actually guarantees that
> you
> > > > > observe
> > > > > >> > the
> > > > > >> > >>> > >> limit? I
> > > > > >> > >>> > >> > > > think
> > > > > >> > >>> > >> > > > > > that
> > > > > >> > >>> > >> > > > > > > > implies that compression always makes
> the
> > > > > >> messages
> > > > > >> > >>> > smaller,
> > > > > >> > >>> > >> > > which i
> > > > > >> > >>> > >> > > > > > think
> > > > > >> > >>> > >> > > > > > > > usually true but is not guaranteed,
> > right?
> > > > e.g.
> > > > > >> if
> > > > > >> > >>> someone
> > > > > >> > >>> > >> > > encrypts
> > > > > >> > >>> > >> > > > > > their
> > > > > >> > >>> > >> > > > > > > > data which tends to randomize it and
> then
> > > > > enables
> > > > > >> > >>> > >> > compressesion,
> > > > > >> > >>> > >> > > it
> > > > > >> > >>> > >> > > > > > could
> > > > > >> > >>> > >> > > > > > > > slightly get bigger?
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > > I also wonder if the rejected
> > alternatives
> > > > you
> > > > > >> > >>> describe
> > > > > >> > >>> > >> > couldn't
> > > > > >> > >>> > >> > > be
> > > > > >> > >>> > >> > > > > > made
> > > > > >> > >>> > >> > > > > > > to
> > > > > >> > >>> > >> > > > > > > > work: basically try to be a bit better
> at
> > > > > >> estimation
> > > > > >> > >>> and
> > > > > >> > >>> > >> > recover
> > > > > >> > >>> > >> > > > when
> > > > > >> > >>> > >> > > > > > we
> > > > > >> > >>> > >> > > > > > > > guess wrong. I don't think the memory
> > usage
> > > > > >> should
> > > > > >> > be
> > > > > >> > >>> a
> > > > > >> > >>> > >> > problem:
> > > > > >> > >>> > >> > > > > isn't
> > > > > >> > >>> > >> > > > > > it
> > > > > >> > >>> > >> > > > > > > > the same memory usage the consumer of
> > that
> > > > > topic
> > > > > >> > would
> > > > > >> > >>> > need?
> > > > > >> > >>> > >> > And
> > > > > >> > >>> > >> > > > > can't
> > > > > >> > >>> > >> > > > > > > you
> > > > > >> > >>> > >> > > > > > > > do the splitting and recompression in a
> > > > > streaming
> > > > > >> > >>> fashion?
> > > > > >> > >>> > >> If
> > > > > >> > >>> > >> > we
> > > > > >> > >>> > >> > > an
> > > > > >> > >>> > >> > > > > > make
> > > > > >> > >>> > >> > > > > > > > the estimation rate low and the
> recovery
> > > cost
> > > > > is
> > > > > >> > just
> > > > > >> > >>> ~2x
> > > > > >> > >>> > >> the
> > > > > >> > >>> > >> > > > normal
> > > > > >> > >>> > >> > > > > > cost
> > > > > >> > >>> > >> > > > > > > > for that batch that should be totally
> > fine,
> > > > > >> right?
> > > > > >> > >>> (It's
> > > > > >> > >>> > >> > > > technically
> > > > > >> > >>> > >> > > > > > true
> > > > > >> > >>> > >> > > > > > > > you might have to split more than once,
> > but
> > > > > since
> > > > > >> > you
> > > > > >> > >>> > halve
> > > > > >> > >>> > >> it
> > > > > >> > >>> > >> > > each
> > > > > >> > >>> > >> > > > > > time
> > > > > >> > >>> > >> > > > > > > I
> > > > > >> > >>> > >> > > > > > > > think should you get a number of
> halvings
> > > > that
> > > > > is
> > > > > >> > >>> > >> logarithmic
> > > > > >> > >>> > >> > in
> > > > > >> > >>> > >> > > > the
> > > > > >> > >>> > >> > > > > > miss
> > > > > >> > >>> > >> > > > > > > > size, which, with better estimation
> you'd
> > > > hope
> > > > > >> would
> > > > > >> > >>> be
> > > > > >> > >>> > >> super
> > > > > >> > >>> > >> > > duper
> > > > > >> > >>> > >> > > > > > > small).
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > > Alternatively maybe we could work on
> the
> > > > other
> > > > > >> side
> > > > > >> > >>> of the
> > > > > >> > >>> > >> > > problem
> > > > > >> > >>> > >> > > > > and
> > > > > >> > >>> > >> > > > > > > try
> > > > > >> > >>> > >> > > > > > > > to make it so that a small miss on
> > message
> > > > size
> > > > > >> > isn't
> > > > > >> > >>> a
> > > > > >> > >>> > big
> > > > > >> > >>> > >> > > > problem.
> > > > > >> > >>> > >> > > > > I
> > > > > >> > >>> > >> > > > > > > > think original issue was that max size
> > and
> > > > > fetch
> > > > > >> > size
> > > > > >> > >>> were
> > > > > >> > >>> > >> > > tightly
> > > > > >> > >>> > >> > > > > > > coupled
> > > > > >> > >>> > >> > > > > > > > and the way memory in the consumer
> worked
> > > you
> > > > > >> really
> > > > > >> > >>> > wanted
> > > > > >> > >>> > >> > fetch
> > > > > >> > >>> > >> > > > > size
> > > > > >> > >>> > >> > > > > > to
> > > > > >> > >>> > >> > > > > > > > be as small as possible because you'd
> use
> > > > that
> > > > > >> much
> > > > > >> > >>> memory
> > > > > >> > >>> > >> per
> > > > > >> > >>> > >> > > > > fetched
> > > > > >> > >>> > >> > > > > > > > partition and the consumer would get
> > stuck
> > > if
> > > > > its
> > > > > >> > >>> fetch
> > > > > >> > >>> > size
> > > > > >> > >>> > >> > > wasn't
> > > > > >> > >>> > >> > > > > big
> > > > > >> > >>> > >> > > > > > > > enough. I think we made some progress
> on
> > > that
> > > > > >> issue
> > > > > >> > >>> and
> > > > > >> > >>> > >> maybe
> > > > > >> > >>> > >> > > more
> > > > > >> > >>> > >> > > > > > could
> > > > > >> > >>> > >> > > > > > > be
> > > > > >> > >>> > >> > > > > > > > done there so that a small bit of
> > fuzziness
> > > > > >> around
> > > > > >> > the
> > > > > >> > >>> > size
> > > > > >> > >>> > >> > would
> > > > > >> > >>> > >> > > > not
> > > > > >> > >>> > >> > > > > > be
> > > > > >> > >>> > >> > > > > > > an
> > > > > >> > >>> > >> > > > > > > > issue?
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > > -Jay
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > > On Tue, Feb 21, 2017 at 12:30 PM,
> Becket
> > > Qin
> > > > <
> > > > > >> > >>> > >> > > becket....@gmail.com
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > > > > > wrote:
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > > > > Hi folks,
> > > > > >> > >>> > >> > > > > > > > >
> > > > > >> > >>> > >> > > > > > > > > I would like to start the discussion
> > > thread
> > > > > on
> > > > > >> > >>> KIP-126.
> > > > > >> > >>> > >> The
> > > > > >> > >>> > >> > KIP
> > > > > >> > >>> > >> > > > > > propose
> > > > > >> > >>> > >> > > > > > > > > adding a new configuration to
> > > KafkaProducer
> > > > > to
> > > > > >> > allow
> > > > > >> > >>> > >> batching
> > > > > >> > >>> > >> > > > based
> > > > > >> > >>> > >> > > > > > on
> > > > > >> > >>> > >> > > > > > > > > uncompressed message size.
> > > > > >> > >>> > >> > > > > > > > >
> > > > > >> > >>> > >> > > > > > > > > Comments are welcome.
> > > > > >> > >>> > >> > > > > > > > >
> > > > > >> > >>> > >> > > > > > > > > The KIP wiki is following:
> > > > > >> > >>> > >> > > > > > > > > https://cwiki.apache.org/confl
> > > > > >> > >>> uence/display/KAFKA/KIP-
> > > > > >> > >>> > >> > > > > > > > > 126+-+Allow+KafkaProducer+to+b
> > > > > >> > >>> > >> atch+based+on+uncompressed+siz
> > > > > >> > >>> > >> > e
> > > > > >> > >>> > >> > > > > > > > >
> > > > > >> > >>> > >> > > > > > > > > Thanks,
> > > > > >> > >>> > >> > > > > > > > >
> > > > > >> > >>> > >> > > > > > > > > Jiangjie (Becket) Qin
> > > > > >> > >>> > >> > > > > > > > >
> > > > > >> > >>> > >> > > > > > > >
> > > > > >> > >>> > >> > > > > > >
> > > > > >> > >>> > >> > > > > >
> > > > > >> > >>> > >> > > > >
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > > > --
> > > > > >> > >>> > >> > > > -Regards,
> > > > > >> > >>> > >> > > > Mayuresh R. Gharat
> > > > > >> > >>> > >> > > > (862) 250-7125
> > > > > >> > >>> > >> > > >
> > > > > >> > >>> > >> > >
> > > > > >> > >>> > >> >
> > > > > >> > >>> > >> >
> > > > > >> > >>> > >> >
> > > > > >> > >>> > >> > --
> > > > > >> > >>> > >> > -Regards,
> > > > > >> > >>> > >> > Mayuresh R. Gharat
> > > > > >> > >>> > >> > (862) 250-7125
> > > > > >> > >>> > >> >
> > > > > >> > >>> > >>
> > > > > >> > >>> > >
> > > > > >> > >>> > >
> > > > > >> > >>> >
> > > > > >> > >>>
> > > > > >> > >>
> > > > > >> > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

Reply via email to