Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Dong Lin Thu, 12 Jul 2018 22:11:20 -0700

Thanks for the update Lucas.

I think the motivation section is intuitive. It will be good to learn more
about the comments from other reviewers.


On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <[email protected]> wrote:

> Hi Dong,
>
> I've updated the motivation section of the KIP by explaining the cases that
> would have user impacts.
> Please take a look at let me know your comments.
>
> Thanks,
> Lucas
>
> On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <[email protected]> wrote:
>
> > Hi Dong,
> >
> > The simulation of disk being slow is merely for me to easily construct a
> > testing scenario
> > with a backlog of produce requests. In production, other than the disk
> > being slow, a backlog of
> > produce requests may also be caused by high produce QPS.
> > In that case, we may not want to kill the broker and that's when this KIP
> > can be useful, both for JBOD
> > and non-JBOD setup.
> >
> > Going back to your previous question about each ProduceRequest covering
> 20
> > partitions that are randomly
> > distributed, let's say a LeaderAndIsr request is enqueued that tries to
> > switch the current broker, say broker0, from leader to follower
> > *for one of the partitions*, say *test-0*. For the sake of argument,
> > let's also assume the other brokers, say broker1, have *stopped* fetching
> > from
> > the current broker, i.e. broker0.
> > 1. If the enqueued produce requests have acks =  -1 (ALL)
> >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR will be
> > put into the purgatory,
> >         and since they'll never be replicated to other brokers (because
> of
> > the assumption made above), they will
> >         be completed either when the LeaderAndISR request is processed or
> > when the timeout happens.
> >   1.2 With this KIP, broker0 will immediately transition the partition
> > test-0 to become a follower,
> >         after the current broker sees the replication of the remaining 19
> > partitions, it can send a response indicating that
> >         it's no longer the leader for the "test-0".
> >   To see the latency difference between 1.1 and 1.2, let's say there are
> > 24K produce requests ahead of the LeaderAndISR, and there are 8 io
> threads,
> >   so each io thread will process approximately 3000 produce requests. Now
> > let's investigate the io thread that finally processed the LeaderAndISR.
> >   For the 3000 produce requests, if we model the time when their
> remaining
> > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR request
> is
> > processed at time t3000.
> >   Without this KIP, the 1st produce request would have waited an extra
> > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 - t1,
> etc.
> >   Roughly speaking, the latency difference is bigger for the earlier
> > produce requests than for the later ones. For the same reason, the more
> > ProduceRequests queued
> >   before the LeaderAndISR, the bigger benefit we get (capped by the
> > produce timeout).
> > 2. If the enqueued produce requests have acks=0 or acks=1
> >   There will be no latency differences in this case, but
> >   2.1 without this KIP, the records of partition test-0 in the
> > ProduceRequests ahead of the LeaderAndISR will be appended to the local
> log,
> >         and eventually be truncated after processing the LeaderAndISR.
> > This is what's referred to as
> >         "some unofficial definition of data loss in terms of messages
> > beyond the high watermark".
> >   2.2 with this KIP, we can mitigate the effect since if the LeaderAndISR
> > is immediately processed, the response to producers will have
> >         the NotLeaderForPartition error, causing producers to retry
> >
> > This explanation above is the benefit for reducing the latency of a
> broker
> > becoming the follower,
> > closely related is reducing the latency of a broker becoming the leader.
> > In this case, the benefit is even more obvious, if other brokers have
> > resigned leadership, and the
> > current broker should take leadership. Any delay in processing the
> > LeaderAndISR will be perceived
> > by clients as unavailability. In extreme cases, this can cause failed
> > produce requests if the retries are
> > exhausted.
> >
> > Another two types of controller requests are UpdateMetadata and
> > StopReplica, which I'll briefly discuss as follows:
> > For UpdateMetadata requests, delayed processing means clients receiving
> > stale metadata, e.g. with the wrong leadership info
> > for certain partitions, and the effect is more retries or even fatal
> > failure if the retries are exhausted.
> >
> > For StopReplica requests, a long queuing time may degrade the performance
> > of topic deletion.
> >
> > Regarding your last question of the delay for DescribeLogDirsRequest, you
> > are right
> > that this KIP cannot help with the latency in getting the log dirs info,
> > and it's only relevant
> > when controller requests are involved.
> >
> > Regards,
> > Lucas
> >
> >
> > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <[email protected]> wrote:
> >
> >> Hey Jun,
> >>
> >> Thanks much for the comments. It is good point. So the feature may be
> >> useful for JBOD use-case. I have one question below.
> >>
> >> Hey Lucas,
> >>
> >> Do you think this feature is also useful for non-JBOD setup or it is
> only
> >> useful for the JBOD setup? It may be useful to understand this.
> >>
> >> When the broker is setup using JBOD, in order to move leaders on the
> >> failed
> >> disk to other disks, the system operator first needs to get the list of
> >> partitions on the failed disk. This is currently achieved using
> >> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to the
> >> broker. If we only prioritize the controller requests, then the
> >> DescribeLogDirsRequest
> >> may still take a long time to be processed by the broker. So the overall
> >> time to move leaders away from the failed disk may still be long even
> with
> >> this KIP. What do you think?
> >>
> >> Thanks,
> >> Dong
> >>
> >>
> >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <[email protected]>
> wrote:
> >>
> >> > Thanks for the insightful comment, Jun.
> >> >
> >> > @Dong,
> >> > Since both of the two comments in your previous email are about the
> >> > benefits of this KIP and whether it's useful,
> >> > in light of Jun's last comment, do you agree that this KIP can be
> >> > beneficial in the case mentioned by Jun?
> >> > Please let me know, thanks!
> >> >
> >> > Regards,
> >> > Lucas
> >> >
> >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <[email protected]> wrote:
> >> >
> >> > > Hi, Lucas, Dong,
> >> > >
> >> > > If all disks on a broker are slow, one probably should just kill the
> >> > > broker. In that case, this KIP may not help. If only one of the
> disks
> >> on
> >> > a
> >> > > broker is slow, one may want to fail that disk and move the leaders
> on
> >> > that
> >> > > disk to other brokers. In that case, being able to process the
> >> > LeaderAndIsr
> >> > > requests faster will potentially help the producers recover quicker.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <[email protected]>
> wrote:
> >> > >
> >> > > > Hey Lucas,
> >> > > >
> >> > > > Thanks for the reply. Some follow up questions below.
> >> > > >
> >> > > > Regarding 1, if each ProduceRequest covers 20 partitions that are
> >> > > randomly
> >> > > > distributed across all partitions, then each ProduceRequest will
> >> likely
> >> > > > cover some partitions for which the broker is still leader after
> it
> >> > > quickly
> >> > > > processes the
> >> > > > LeaderAndIsrRequest. Then broker will still be slow in processing
> >> these
> >> > > > ProduceRequest and request will still be very high with this KIP.
> It
> >> > > seems
> >> > > > that most ProduceRequest will still timeout after 30 seconds. Is
> >> this
> >> > > > understanding correct?
> >> > > >
> >> > > > Regarding 2, if most ProduceRequest will still timeout after 30
> >> > seconds,
> >> > > > then it is less clear how this KIP reduces average produce
> latency.
> >> Can
> >> > > you
> >> > > > clarify what metrics can be improved by this KIP?
> >> > > >
> >> > > > Not sure why system operator directly cares number of truncated
> >> > messages.
> >> > > > Do you mean this KIP can improve average throughput or reduce
> >> message
> >> > > > duplication? It will be good to understand this.
> >> > > >
> >> > > > Thanks,
> >> > > > Dong
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <[email protected]>
> >> > wrote:
> >> > > >
> >> > > > > Hi Dong,
> >> > > > >
> >> > > > > Thanks for your valuable comments. Please see my reply below.
> >> > > > >
> >> > > > > 1. The Google doc showed only 1 partition. Now let's consider a
> >> more
> >> > > > common
> >> > > > > scenario
> >> > > > > where broker0 is the leader of many partitions. And let's say
> for
> >> > some
> >> > > > > reason its IO becomes slow.
> >> > > > > The number of leader partitions on broker0 is so large, say 10K,
> >> that
> >> > > the
> >> > > > > cluster is skewed,
> >> > > > > and the operator would like to shift the leadership for a lot of
> >> > > > > partitions, say 9K, to other brokers,
> >> > > > > either manually or through some service like cruise control.
> >> > > > > With this KIP, not only will the leadership transitions finish
> >> more
> >> > > > > quickly, helping the cluster itself becoming more balanced,
> >> > > > > but all existing producers corresponding to the 9K partitions
> will
> >> > get
> >> > > > the
> >> > > > > errors relatively quickly
> >> > > > > rather than relying on their timeout, thanks to the batched
> async
> >> ZK
> >> > > > > operations.
> >> > > > > To me it's a useful feature to have during such troublesome
> times.
> >> > > > >
> >> > > > >
> >> > > > > 2. The experiments in the Google Doc have shown that with this
> KIP
> >> > many
> >> > > > > producers
> >> > > > > receive an explicit error NotLeaderForPartition, based on which
> >> they
> >> > > > retry
> >> > > > > immediately.
> >> > > > > Therefore the latency (~14 seconds+quick retry) for their single
> >> > > message
> >> > > > is
> >> > > > > much smaller
> >> > > > > compared with the case of timing out without the KIP (30 seconds
> >> for
> >> > > > timing
> >> > > > > out + quick retry).
> >> > > > > One might argue that reducing the timing out on the producer
> side
> >> can
> >> > > > > achieve the same result,
> >> > > > > yet reducing the timeout has its own drawbacks[1].
> >> > > > >
> >> > > > > Also *IF* there were a metric to show the number of truncated
> >> > messages
> >> > > on
> >> > > > > brokers,
> >> > > > > with the experiments done in the Google Doc, it should be easy
> to
> >> see
> >> > > > that
> >> > > > > a lot fewer messages need
> >> > > > > to be truncated on broker0 since the up-to-date metadata avoids
> >> > > appending
> >> > > > > of messages
> >> > > > > in subsequent PRODUCE requests. If we talk to a system operator
> >> and
> >> > ask
> >> > > > > whether
> >> > > > > they prefer fewer wasteful IOs, I bet most likely the answer is
> >> yes.
> >> > > > >
> >> > > > > 3. To answer your question, I think it might be helpful to
> >> construct
> >> > > some
> >> > > > > formulas.
> >> > > > > To simplify the modeling, I'm going back to the case where there
> >> is
> >> > > only
> >> > > > > ONE partition involved.
> >> > > > > Following the experiments in the Google Doc, let's say broker0
> >> > becomes
> >> > > > the
> >> > > > > follower at time t0,
> >> > > > > and after t0 there were still N produce requests in its request
> >> > queue.
> >> > > > > With the up-to-date metadata brought by this KIP, broker0 can
> >> reply
> >> > > with
> >> > > > an
> >> > > > > NotLeaderForPartition exception,
> >> > > > > let's use M1 to denote the average processing time of replying
> >> with
> >> > > such
> >> > > > an
> >> > > > > error message.
> >> > > > > Without this KIP, the broker will need to append messages to
> >> > segments,
> >> > > > > which may trigger a flush to disk,
> >> > > > > let's use M2 to denote the average processing time for such
> logic.
> >> > > > > Then the average extra latency incurred without this KIP is N *
> >> (M2 -
> >> > > > M1) /
> >> > > > > 2.
> >> > > > >
> >> > > > > In practice, M2 should always be larger than M1, which means as
> >> long
> >> > > as N
> >> > > > > is positive,
> >> > > > > we would see improvements on the average latency.
> >> > > > > There does not need to be significant backlog of requests in the
> >> > > request
> >> > > > > queue,
> >> > > > > or severe degradation of disk performance to have the
> improvement.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Lucas
> >> > > > >
> >> > > > >
> >> > > > > [1] For instance, reducing the timeout on the producer side can
> >> > trigger
> >> > > > > unnecessary duplicate requests
> >> > > > > when the corresponding leader broker is overloaded, exacerbating
> >> the
> >> > > > > situation.
> >> > > > >
> >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <[email protected]>
> >> > wrote:
> >> > > > >
> >> > > > > > Hey Lucas,
> >> > > > > >
> >> > > > > > Thanks much for the detailed documentation of the experiment.
> >> > > > > >
> >> > > > > > Initially I also think having a separate queue for controller
> >> > > requests
> >> > > > is
> >> > > > > > useful because, as you mentioned in the summary section of the
> >> > Google
> >> > > > > doc,
> >> > > > > > controller requests are generally more important than data
> >> requests
> >> > > and
> >> > > > > we
> >> > > > > > probably want controller requests to be processed sooner. But
> >> then
> >> > > Eno
> >> > > > > has
> >> > > > > > two very good questions which I am not sure the Google doc has
> >> > > answered
> >> > > > > > explicitly. Could you help with the following questions?
> >> > > > > >
> >> > > > > > 1) It is not very clear what is the actual benefit of KIP-291
> to
> >> > > users.
> >> > > > > The
> >> > > > > > experiment setup in the Google doc simulates the scenario that
> >> > broker
> >> > > > is
> >> > > > > > very slow handling ProduceRequest due to e.g. slow disk. It
> >> > currently
> >> > > > > > assumes that there is only 1 partition. But in the common
> >> scenario,
> >> > > it
> >> > > > is
> >> > > > > > probably reasonable to assume that there are many other
> >> partitions
> >> > > that
> >> > > > > are
> >> > > > > > also actively produced to and ProduceRequest to these
> partition
> >> > also
> >> > > > > takes
> >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can become
> >> > > follower
> >> > > > > for
> >> > > > > > the partition 0 soon, it probably still needs to process the
> >> > > > > ProduceRequest
> >> > > > > > slowly t in the queue because these ProduceRequests cover
> other
> >> > > > > partitions.
> >> > > > > > Thus most ProduceRequest will still timeout after 30 seconds
> and
> >> > most
> >> > > > > > clients will still likely timeout after 30 seconds. Then it is
> >> not
> >> > > > > > obviously what is the benefit to client since client will
> >> timeout
> >> > > after
> >> > > > > 30
> >> > > > > > seconds before possibly re-connecting to broker1, with or
> >> without
> >> > > > > KIP-291.
> >> > > > > > Did I miss something here?
> >> > > > > >
> >> > > > > > 2) I guess Eno's is asking for the specific benefits of this
> >> KIP to
> >> > > > user
> >> > > > > or
> >> > > > > > system administrator, e.g. whether this KIP decreases average
> >> > > latency,
> >> > > > > > 999th percentile latency, probably of exception exposed to
> >> client
> >> > > etc.
> >> > > > It
> >> > > > > > is probably useful to clarify this.
> >> > > > > >
> >> > > > > > 3) Does this KIP help improve user experience only when there
> is
> >> > > issue
> >> > > > > with
> >> > > > > > broker, e.g. significant backlog in the request queue due to
> >> slow
> >> > > disk
> >> > > > as
> >> > > > > > described in the Google doc? Or is this KIP also useful when
> >> there
> >> > is
> >> > > > no
> >> > > > > > ongoing issue in the cluster? It might be helpful to clarify
> >> this
> >> > to
> >> > > > > > understand the benefit of this KIP.
> >> > > > > >
> >> > > > > >
> >> > > > > > Thanks much,
> >> > > > > > Dong
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> >> [email protected]
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Eno,
> >> > > > > > >
> >> > > > > > > Sorry for the delay in getting the experiment results.
> >> > > > > > > Here is a link to the positive impact achieved by
> implementing
> >> > the
> >> > > > > > proposed
> >> > > > > > > change:
> >> > > > > > > https://docs.google.com/document/d/
> 1ge2jjp5aPTBber6zaIT9AdhW
> >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> >> > > > > > > Please take a look when you have time and let me know your
> >> > > feedback.
> >> > > > > > >
> >> > > > > > > Regards,
> >> > > > > > > Lucas
> >> > > > > > >
> >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <[email protected]>
> >> wrote:
> >> > > > > > >
> >> > > > > > > > Thanks for the pointer. Will take a look might suit our
> >> > > > requirements
> >> > > > > > > > better.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Harsha
> >> > > > > > > >
> >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> >> > > > [email protected]
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Hi Harsha,
> >> > > > > > > > >
> >> > > > > > > > > If I understand correctly, the replication quota
> mechanism
> >> > > > proposed
> >> > > > > > in
> >> > > > > > > > > KIP-73 can be helpful in that scenario.
> >> > > > > > > > > Have you tried it out?
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > > Lucas
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> [email protected]
> >> >
> >> > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Lucas,
> >> > > > > > > > > > One more question, any thoughts on making this
> >> configurable
> >> > > > > > > > > > and also allowing subset of data requests to be
> >> > prioritized.
> >> > > > For
> >> > > > > > > > example
> >> > > > > > > > >
> >> > > > > > > > > > ,we notice in our cluster when we take out a broker
> and
> >> > bring
> >> > > > new
> >> > > > > > one
> >> > > > > > > > it
> >> > > > > > > > >
> >> > > > > > > > > > will try to become follower and have lot of fetch
> >> requests
> >> > to
> >> > > > > other
> >> > > > > > > > > leaders
> >> > > > > > > > > > in clusters. This will negatively effect the
> >> > > application/client
> >> > > > > > > > > requests.
> >> > > > > > > > > > We are also exploring the similar solution to
> >> de-prioritize
> >> > > if
> >> > > > a
> >> > > > > > new
> >> > > > > > > > > > replica comes in for fetch requests, we are ok with
> the
> >> > > replica
> >> > > > > to
> >> > > > > > be
> >> > > > > > > > > > taking time but the leaders should prioritize the
> client
> >> > > > > requests.
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > Harsha
> >> > > > > > > > > >
> >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Hi Eno,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Sorry for the delayed response.
> >> > > > > > > > > > > - I haven't implemented the feature yet, so no
> >> > experimental
> >> > > > > > results
> >> > > > > > > > so
> >> > > > > > > > >
> >> > > > > > > > > > > far.
> >> > > > > > > > > > > And I plan to test in out in the following days.
> >> > > > > > > > > > >
> >> > > > > > > > > > > - You are absolutely right that the priority queue
> >> does
> >> > not
> >> > > > > > > > completely
> >> > > > > > > > >
> >> > > > > > > > > > > prevent
> >> > > > > > > > > > > data requests being processed ahead of controller
> >> > requests.
> >> > > > > > > > > > > That being said, I expect it to greatly mitigate the
> >> > effect
> >> > > > of
> >> > > > > > > stable
> >> > > > > > > > > > > metadata.
> >> > > > > > > > > > > In any case, I'll try it out and post the results
> >> when I
> >> > > have
> >> > > > > it.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Regards,
> >> > > > > > > > > > > Lucas
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> >> > > > > > > > [email protected]
> >> > > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Lucas,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Sorry for the delay, just had a look at this. A
> >> couple
> >> > of
> >> > > > > > > > questions:
> >> > > > > > > > >
> >> > > > > > > > > > > > - did you notice any positive change after
> >> implementing
> >> > > > this
> >> > > > > > KIP?
> >> > > > > > > > > I'm
> >> > > > > > > > > > > > wondering if you have any experimental results
> that
> >> > show
> >> > > > the
> >> > > > > > > > benefit
> >> > > > > > > > > of
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > two queues.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - priority is usually not sufficient in addressing
> >> the
> >> > > > > problem
> >> > > > > > > the
> >> > > > > > > > > KIP
> >> > > > > > > > > > > > identifies. Even with priority queues, you will
> >> > sometimes
> >> > > > > > > (often?)
> >> > > > > > > > > have
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > case that data plane requests will be ahead of the
> >> > > control
> >> > > > > > plane
> >> > > > > > > > > > > requests.
> >> > > > > > > > > > > > This happens because the system might have already
> >> > > started
> >> > > > > > > > > processing
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > data plane requests before the control plane ones
> >> > > arrived.
> >> > > > So
> >> > > > > > it
> >> > > > > > > > > would
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > good to know what % of the problem this KIP
> >> addresses.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > Eno
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> >> > > > > [email protected]
> >> > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Change looks good.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> >> > > > > > > > [email protected]
> >> > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi Ted,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks for the suggestion. I've updated the
> KIP.
> >> > > Please
> >> > > > > > take
> >> > > > > > > > > > another
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > look.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> >> > > > > > > [email protected]
> >> > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > It would be good if you can include the
> >> default
> >> > > value
> >> > > > > for
> >> > > > > > > > this
> >> > > > > > > > >
> >> > > > > > > > > > new
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > config
> >> > > > > > > > > > > > > > > in the KIP.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang
> <
> >> > > > > > > > > > [email protected]
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hi Ted, Dong
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I've updated the KIP by adding a new
> config,
> >> > > > instead
> >> > > > > of
> >> > > > > > > > > reusing
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > existing one.
> >> > > > > > > > > > > > > > > > Please take another look when you have
> time.
> >> > > > Thanks a
> >> > > > > > > lot!
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> >> > > > > > > > [email protected]
> >> > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > bq. that's a waste of resource if
> control
> >> > > request
> >> > > > > > rate
> >> > > > > > > is
> >> > > > > > > > > low
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > I don't know if control request rate can
> >> get
> >> > to
> >> > > > > > > 100,000,
> >> > > > > > > > > > > likely
> >> > > > > > > > > > > > > not.
> >> > > > > > > > > > > > > > > Then
> >> > > > > > > > > > > > > > > > > using the same bound as that for data
> >> > requests
> >> > > > > seems
> >> > > > > > > > high.
> >> > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas
> >> Wang
> >> > <
> >> > > > > > > > > > > > > [email protected] >
> >> > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Hi Ted,
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> >> > > > > > "queued.max.requests"
> >> > > > > > > in
> >> > > > > > > > > > > > cluster A
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > > 1000,
> >> > > > > > > > > > > > > > > > > > while the setting in cluster B is
> >> 100,000.
> >> > > > > > > > > > > > > > > > > > The 100 times difference might have
> >> > indicated
> >> > > > > that
> >> > > > > > > > > machines
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > B
> >> > > > > > > > > > > > > > > > > > have larger memory.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > By reusing the "queued.max.requests",
> >> the
> >> > > > > > > > > > > controlRequestQueue
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > > B
> >> > > > > > > > > > > > > > > > > > automatically
> >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> explicitly
> >> > > > bothering
> >> > > > > > the
> >> > > > > > > > > > > > operators.
> >> > > > > > > > > > > > > > > > > > I understand the counter argument can
> be
> >> > that
> >> > > > > maybe
> >> > > > > > > > > that's
> >> > > > > > > > > > a
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > waste
> >> > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > > resource if control request
> >> > > > > > > > > > > > > > > > > > rate is low and operators may want to
> >> fine
> >> > > tune
> >> > > > > the
> >> > > > > > > > > > capacity
> >> > > > > > > > > > > of
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I'm ok with either approach, and can
> >> change
> >> > > it
> >> > > > if
> >> > > > > > you
> >> > > > > > > > or
> >> > > > > > > > >
> >> > > > > > > > > > > anyone
> >> > > > > > > > > > > > > > else
> >> > > > > > > > > > > > > > > > > feels
> >> > > > > > > > > > > > > > > > > > strong about adding the extra config.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted
> Yu
> >> <
> >> > > > > > > > > > [email protected]
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Lucas:
> >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can
> >> you
> >> > > > > > elaborate
> >> > > > > > > a
> >> > > > > > > > > bit
> >> > > > > > > > > > > more
> >> > > > > > > > > > > > > on
> >> > > > > > > > > > > > > > > why
> >> > > > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > > separate config has bigger impact ?
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM,
> Dong
> >> > Lin <
> >> > > > > > > > > > > > [email protected]
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
> >> overall.
> >> > > > Some
> >> > > > > > > > > comments
> >> > > > > > > > > > > > below:
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > - We usually specify the full
> mbean
> >> for
> >> > > the
> >> > > > > new
> >> > > > > > > > > metrics
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > KIP.
> >> > > > > > > > > > > > > > > > > Can
> >> > > > > > > > > > > > > > > > > > > you
> >> > > > > > > > > > > > > > > > > > > > specify it in the Public Interface
> >> > > section
> >> > > > > > > similar
> >> > > > > > > > > to
> >> > > > > > > > > > > > KIP-237
> >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
> >> Metrics>
> >> > > > > > > > > > > > > > > > > > > > ?
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the same
> >> > pattern
> >> > > as
> >> > > > > > > KIP-153
> >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> >> > > > > > > > > > > > > metric>,
> >> > > > > > > > > > > > > > > > > > > > where we keep the existing sensor
> >> name
> >> > > > > > > > > "BytesInPerSec"
> >> > > > > > > > > > > and
> >> > > > > > > > > > > > > add
> >> > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > > > sensor
> >> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather
> >> than
> >> > > > > > replacing
> >> > > > > > > > > the
> >> > > > > > > > > > > > sensor
> >> > > > > > > > > > > > > > > name "
> >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> >> > > > > "ClientBytesInPerSec".
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP changes
> the
> >> > > > semantics
> >> > > > > > of
> >> > > > > > > > the
> >> > > > > > > > >
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > > > config
> >> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because the
> >> > number
> >> > > of
> >> > > > > > total
> >> > > > > > > > > > > requests
> >> > > > > > > > > > > > > > queued
> >> > > > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > > > broker will be no longer bounded
> by
> >> > > > > > > > > > > "queued.max.requests".
> >> > > > > > > > > > > > > This
> >> > > > > > > > > > > > > > > > > > probably
> >> > > > > > > > > > > > > > > > > > > > needs to be specified in the
> Public
> >> > > > > Interfaces
> >> > > > > > > > > section
> >> > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > discussion.
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > > > > > > > > Dong
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM,
> >> Lucas
> >> > > > Wang
> >> > > > > <
> >> > > > > > > > > > > > > > > > [email protected] >
> >> > > > > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
> >> separate
> >> > > queue
> >> > > > > for
> >> > > > > > > > > > > controller
> >> > > > > > > > > > > > > > > > requests:
> >> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > >
> >> > > > > > > > > > > > 291%
> >> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> >> > > > > > > > > > control+requests+and+data+
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > > requests
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Can you please take a look and
> >> let me
> >> > > > know
> >> > > > > > your
> >> > > > > > > > > > > feedback?
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> >> > > > > > > > > > > > > > > > > > > > > Regards,
> >> > > > > > > > > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Reply via email to