Hey Lucas, I think for now we can probably discuss based on the existing Kafka's design where controller to a broker is hard coded to be 1. It looks like Becket has provided a good example in which requests from the same controller can be processed out of order.
Thanks, Dong On Wed, Jul 18, 2018 at 8:35 PM, Lucas Wang <lucasatu...@gmail.com> wrote: > @Becket and Dong, > I think currently the ordering guarantee is achieved because > the max inflight request from the controller to a broker is hard coded to > be 1. > > If let's hypothetically say the max inflight requests is > 1, then I think > Dong > is right to say that even the separate queue cannot guarantee ordered > processing, > For example, Req1 and Req2 are sent to a broker, and after a connection > reconnection, > both requests are sent again, causing the broker to have 4 requests in the > following order > Req2 > Req1 > Req2 > Req1. > > In summary, it seems using the dequeue should not cause problems with > out-of-order processing. > Is that right? > > Lucas > > On Wed, Jul 18, 2018 at 6:24 PM, Dong Lin <lindon...@gmail.com> wrote: > > > Hey Becket, > > > > It seems that the requests from the old controller will be discarded due > to > > old controller epoch. It is not clear whether this is a problem. > > > > And if this out-of-order processing of controller requests is a problem, > it > > seems like an existing problem which also applies to the multi-queue > based > > design. So it is probably not a concern specific to the use of deque. > Does > > that sound reasonable? > > > > Thanks, > > Dong > > > > > > On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <becket....@gmail.com> wrote: > > > > > Hi Mayuresh/Joel, > > > > > > Using the request channel as a dequeue was bright up some time ago when > > we > > > initially thinking of prioritizing the request. The concern was that > the > > > controller requests are supposed to be processed in order. If we can > > ensure > > > that there is one controller request in the request channel, the order > is > > > not a concern. But in cases that there are more than one controller > > request > > > inserted into the queue, the controller request order may change and > > cause > > > problem. For example, think about the following sequence: > > > 1. Controller successfully sent a request R1 to broker > > > 2. Broker receives R1 and put the request to the head of the request > > queue. > > > 3. Controller to broker connection failed and the controller > reconnected > > to > > > the broker. > > > 4. Controller sends a request R2 to the broker > > > 5. Broker receives R2 and add it to the head of the request queue. > > > Now on the broker side, R2 will be processed before R1 is processed, > > which > > > may cause problem. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jjkosh...@gmail.com> > wrote: > > > > > > > @Mayuresh - I like your idea. It appears to be a simpler less > invasive > > > > alternative and it should work. Jun/Becket/others, do you see any > > > pitfalls > > > > with this approach? > > > > > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lucasatu...@gmail.com> > > > > wrote: > > > > > > > > > @Mayuresh, > > > > > That's a very interesting idea that I haven't thought before. > > > > > It seems to solve our problem at hand pretty well, and also > > > > > avoids the need to have a new size metric and capacity config > > > > > for the controller request queue. In fact, if we were to adopt > > > > > this design, there is no public interface change, and we > > > > > probably don't need a KIP. > > > > > Also implementation wise, it seems > > > > > the java class LinkedBlockingQueue can readily satisfy the > > requirement > > > > > by supporting a capacity, and also allowing inserting at both ends. > > > > > > > > > > My only concern is that this design is tied to the coincidence that > > > > > we have two request priorities and there are two ends to a deque. > > > > > Hence by using the proposed design, it seems the network layer is > > > > > more tightly coupled with upper layer logic, e.g. if we were to add > > > > > an extra priority level in the future for some reason, we would > > > probably > > > > > need to go back to the design of separate queues, one for each > > priority > > > > > level. > > > > > > > > > > In summary, I'm ok with both designs and lean toward your suggested > > > > > approach. > > > > > Let's hear what others think. > > > > > > > > > > @Becket, > > > > > In light of Mayuresh's suggested new design, I'm answering your > > > question > > > > > only in the context > > > > > of the current KIP design: I think your suggestion makes sense, and > > I'm > > > > ok > > > > > with removing the capacity config and > > > > > just relying on the default value of 20 being sufficient enough. > > > > > > > > > > Thanks, > > > > > Lucas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat < > > > > > gharatmayures...@gmail.com > > > > > > wrote: > > > > > > > > > > > Hi Lucas, > > > > > > > > > > > > Seems like the main intent here is to prioritize the controller > > > request > > > > > > over any other requests. > > > > > > In that case, we can change the request queue to a dequeue, where > > you > > > > > > always insert the normal requests (produce, consume,..etc) to the > > end > > > > of > > > > > > the dequeue, but if its a controller request, you insert it to > the > > > head > > > > > of > > > > > > the queue. This ensures that the controller request will be given > > > > higher > > > > > > priority over other requests. > > > > > > > > > > > > Also since we only read one request from the socket and mute it > and > > > > only > > > > > > unmute it after handling the request, this would ensure that we > > don't > > > > > > handle controller requests out of order. > > > > > > > > > > > > With this approach we can avoid the second queue and the > additional > > > > > config > > > > > > for the size of the queue. > > > > > > > > > > > > What do you think ? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Mayuresh > > > > > > > > > > > > > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <becket....@gmail.com > > > > > > wrote: > > > > > > > > > > > > > Hey Joel, > > > > > > > > > > > > > > Thank for the detail explanation. I agree the current design > > makes > > > > > sense. > > > > > > > My confusion is about whether the new config for the controller > > > queue > > > > > > > capacity is necessary. I cannot think of a case in which users > > > would > > > > > > change > > > > > > > it. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin < > > becket....@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Hi Lucas, > > > > > > > > > > > > > > > > I guess my question can be rephrased to "do we expect user to > > > ever > > > > > > change > > > > > > > > the controller request queue capacity"? If we agree that 20 > is > > > > > already > > > > > > a > > > > > > > > very generous default number and we do not expect user to > > change > > > > it, > > > > > is > > > > > > > it > > > > > > > > still necessary to expose this as a config? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang < > > > lucasatu...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> @Becket > > > > > > > >> 1. Thanks for the comment. You are right that normally there > > > > should > > > > > be > > > > > > > >> just > > > > > > > >> one controller request because of muting, > > > > > > > >> and I had NOT intended to say there would be many enqueued > > > > > controller > > > > > > > >> requests. > > > > > > > >> I went through the KIP again, and I'm not sure which part > > > conveys > > > > > that > > > > > > > >> info. > > > > > > > >> I'd be happy to revise if you point it out the section. > > > > > > > >> > > > > > > > >> 2. Though it should not happen in normal conditions, the > > current > > > > > > design > > > > > > > >> does not preclude multiple controllers running > > > > > > > >> at the same time, hence if we don't have the controller > queue > > > > > capacity > > > > > > > >> config and simply make its capacity to be 1, > > > > > > > >> network threads handling requests from different controllers > > > will > > > > be > > > > > > > >> blocked during those troublesome times, > > > > > > > >> which is probably not what we want. On the other hand, > adding > > > the > > > > > > extra > > > > > > > >> config with a default value, say 20, guards us from issues > in > > > > those > > > > > > > >> troublesome times, and IMO there isn't much downside of > adding > > > the > > > > > > extra > > > > > > > >> config. > > > > > > > >> > > > > > > > >> @Mayuresh > > > > > > > >> Good catch, this sentence is an obsolete statement based on > a > > > > > previous > > > > > > > >> design. I've revised the wording in the KIP. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Lucas > > > > > > > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat < > > > > > > > >> gharatmayures...@gmail.com> wrote: > > > > > > > >> > > > > > > > >> > Hi Lucas, > > > > > > > >> > > > > > > > > >> > Thanks for the KIP. > > > > > > > >> > I am trying to understand why you think "The memory > > > consumption > > > > > can > > > > > > > rise > > > > > > > >> > given the total number of queued requests can go up to 2x" > > in > > > > the > > > > > > > impact > > > > > > > >> > section. Normally the requests from controller to a Broker > > are > > > > not > > > > > > > high > > > > > > > >> > volume, right ? > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > Thanks, > > > > > > > >> > > > > > > > > >> > Mayuresh > > > > > > > >> > > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin < > > > > becket....@gmail.com> > > > > > > > >> wrote: > > > > > > > >> > > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane > > from > > > > the > > > > > > > data > > > > > > > >> > plane > > > > > > > >> > > makes a lot of sense. > > > > > > > >> > > > > > > > > > >> > > In the KIP you mentioned that the controller request > queue > > > may > > > > > > have > > > > > > > >> many > > > > > > > >> > > requests in it. Will this be a common case? The > controller > > > > > > requests > > > > > > > >> still > > > > > > > >> > > goes through the SocketServer. The SocketServer will > mute > > > the > > > > > > > channel > > > > > > > >> > once > > > > > > > >> > > a request is read and put into the request channel. So > > > > assuming > > > > > > > there > > > > > > > >> is > > > > > > > >> > > only one connection between controller and each broker, > on > > > the > > > > > > > broker > > > > > > > >> > side, > > > > > > > >> > > there should be only one controller request in the > > > controller > > > > > > > request > > > > > > > >> > queue > > > > > > > >> > > at any given time. If that is the case, do we need a > > > separate > > > > > > > >> controller > > > > > > > >> > > request queue capacity config? The default value 20 > means > > > that > > > > > we > > > > > > > >> expect > > > > > > > >> > > there are 20 controller switches to happen in a short > > period > > > > of > > > > > > > time. > > > > > > > >> I > > > > > > > >> > am > > > > > > > >> > > not sure whether someone should increase the controller > > > > request > > > > > > > queue > > > > > > > >> > > capacity to handle such case, as it seems indicating > > > something > > > > > > very > > > > > > > >> wrong > > > > > > > >> > > has happened. > > > > > > > >> > > > > > > > > > >> > > Thanks, > > > > > > > >> > > > > > > > > > >> > > Jiangjie (Becket) Qin > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin < > > > > lindon...@gmail.com> > > > > > > > >> wrote: > > > > > > > >> > > > > > > > > > >> > > > Thanks for the update Lucas. > > > > > > > >> > > > > > > > > > > >> > > > I think the motivation section is intuitive. It will > be > > > good > > > > > to > > > > > > > >> learn > > > > > > > >> > > more > > > > > > > >> > > > about the comments from other reviewers. > > > > > > > >> > > > > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang < > > > > > > > lucasatu...@gmail.com> > > > > > > > >> > > wrote: > > > > > > > >> > > > > > > > > > > >> > > > > Hi Dong, > > > > > > > >> > > > > > > > > > > > >> > > > > I've updated the motivation section of the KIP by > > > > explaining > > > > > > the > > > > > > > >> > cases > > > > > > > >> > > > that > > > > > > > >> > > > > would have user impacts. > > > > > > > >> > > > > Please take a look at let me know your comments. > > > > > > > >> > > > > > > > > > > > >> > > > > Thanks, > > > > > > > >> > > > > Lucas > > > > > > > >> > > > > > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang < > > > > > > > lucasatu...@gmail.com > > > > > > > >> > > > > > > > > >> > > > wrote: > > > > > > > >> > > > > > > > > > > > >> > > > > > Hi Dong, > > > > > > > >> > > > > > > > > > > > > >> > > > > > The simulation of disk being slow is merely for me > > to > > > > > easily > > > > > > > >> > > construct > > > > > > > >> > > > a > > > > > > > >> > > > > > testing scenario > > > > > > > >> > > > > > with a backlog of produce requests. In production, > > > other > > > > > > than > > > > > > > >> the > > > > > > > >> > > disk > > > > > > > >> > > > > > being slow, a backlog of > > > > > > > >> > > > > > produce requests may also be caused by high > produce > > > QPS. > > > > > > > >> > > > > > In that case, we may not want to kill the broker > and > > > > > that's > > > > > > > when > > > > > > > >> > this > > > > > > > >> > > > KIP > > > > > > > >> > > > > > can be useful, both for JBOD > > > > > > > >> > > > > > and non-JBOD setup. > > > > > > > >> > > > > > > > > > > > > >> > > > > > Going back to your previous question about each > > > > > > ProduceRequest > > > > > > > >> > > covering > > > > > > > >> > > > > 20 > > > > > > > >> > > > > > partitions that are randomly > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is > > > > enqueued > > > > > > that > > > > > > > >> > tries > > > > > > > >> > > to > > > > > > > >> > > > > > switch the current broker, say broker0, from > leader > > to > > > > > > > follower > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the > > > sake > > > > of > > > > > > > >> > argument, > > > > > > > >> > > > > > let's also assume the other brokers, say broker1, > > have > > > > > > > *stopped* > > > > > > > >> > > > fetching > > > > > > > >> > > > > > from > > > > > > > >> > > > > > the current broker, i.e. broker0. > > > > > > > >> > > > > > 1. If the enqueued produce requests have acks = > -1 > > > > (ALL) > > > > > > > >> > > > > > 1.1 without this KIP, the ProduceRequests ahead > of > > > > > > > >> LeaderAndISR > > > > > > > >> > > will > > > > > > > >> > > > be > > > > > > > >> > > > > > put into the purgatory, > > > > > > > >> > > > > > and since they'll never be replicated to > > other > > > > > > brokers > > > > > > > >> > > (because > > > > > > > >> > > > > of > > > > > > > >> > > > > > the assumption made above), they will > > > > > > > >> > > > > > be completed either when the LeaderAndISR > > > > request > > > > > is > > > > > > > >> > > processed > > > > > > > >> > > > or > > > > > > > >> > > > > > when the timeout happens. > > > > > > > >> > > > > > 1.2 With this KIP, broker0 will immediately > > > transition > > > > > the > > > > > > > >> > > partition > > > > > > > >> > > > > > test-0 to become a follower, > > > > > > > >> > > > > > after the current broker sees the > > replication > > > of > > > > > the > > > > > > > >> > > remaining > > > > > > > >> > > > 19 > > > > > > > >> > > > > > partitions, it can send a response indicating that > > > > > > > >> > > > > > it's no longer the leader for the > "test-0". > > > > > > > >> > > > > > To see the latency difference between 1.1 and > 1.2, > > > > let's > > > > > > say > > > > > > > >> > there > > > > > > > >> > > > are > > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, > and > > > > there > > > > > > are > > > > > > > 8 > > > > > > > >> io > > > > > > > >> > > > > threads, > > > > > > > >> > > > > > so each io thread will process approximately > 3000 > > > > > produce > > > > > > > >> > requests. > > > > > > > >> > > > Now > > > > > > > >> > > > > > let's investigate the io thread that finally > > processed > > > > the > > > > > > > >> > > > LeaderAndISR. > > > > > > > >> > > > > > For the 3000 produce requests, if we model the > > time > > > > when > > > > > > > their > > > > > > > >> > > > > remaining > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and > the > > > > > > > LeaderAndISR > > > > > > > >> > > > request > > > > > > > >> > > > > is > > > > > > > >> > > > > > processed at time t3000. > > > > > > > >> > > > > > Without this KIP, the 1st produce request would > > have > > > > > > waited > > > > > > > an > > > > > > > >> > > extra > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra > > > time > > > > of > > > > > > > >> t3000 - > > > > > > > >> > > t1, > > > > > > > >> > > > > etc. > > > > > > > >> > > > > > Roughly speaking, the latency difference is > bigger > > > for > > > > > the > > > > > > > >> > earlier > > > > > > > >> > > > > > produce requests than for the later ones. For the > > same > > > > > > reason, > > > > > > > >> the > > > > > > > >> > > more > > > > > > > >> > > > > > ProduceRequests queued > > > > > > > >> > > > > > before the LeaderAndISR, the bigger benefit we > get > > > > > (capped > > > > > > > by > > > > > > > >> the > > > > > > > >> > > > > > produce timeout). > > > > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or > > > > acks=1 > > > > > > > >> > > > > > There will be no latency differences in this > case, > > > but > > > > > > > >> > > > > > 2.1 without this KIP, the records of partition > > > test-0 > > > > in > > > > > > the > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be > > > > appended > > > > > > to > > > > > > > >> the > > > > > > > >> > > local > > > > > > > >> > > > > log, > > > > > > > >> > > > > > and eventually be truncated after > processing > > > the > > > > > > > >> > > LeaderAndISR. > > > > > > > >> > > > > > This is what's referred to as > > > > > > > >> > > > > > "some unofficial definition of data loss > in > > > > terms > > > > > of > > > > > > > >> > messages > > > > > > > >> > > > > > beyond the high watermark". > > > > > > > >> > > > > > 2.2 with this KIP, we can mitigate the effect > > since > > > if > > > > > the > > > > > > > >> > > > LeaderAndISR > > > > > > > >> > > > > > is immediately processed, the response to > producers > > > will > > > > > > have > > > > > > > >> > > > > > the NotLeaderForPartition error, causing > > > > producers > > > > > > to > > > > > > > >> retry > > > > > > > >> > > > > > > > > > > > > >> > > > > > This explanation above is the benefit for reducing > > the > > > > > > latency > > > > > > > >> of a > > > > > > > >> > > > > broker > > > > > > > >> > > > > > becoming the follower, > > > > > > > >> > > > > > closely related is reducing the latency of a > broker > > > > > becoming > > > > > > > the > > > > > > > >> > > > leader. > > > > > > > >> > > > > > In this case, the benefit is even more obvious, if > > > other > > > > > > > brokers > > > > > > > >> > have > > > > > > > >> > > > > > resigned leadership, and the > > > > > > > >> > > > > > current broker should take leadership. Any delay > in > > > > > > processing > > > > > > > >> the > > > > > > > >> > > > > > LeaderAndISR will be perceived > > > > > > > >> > > > > > by clients as unavailability. In extreme cases, > this > > > can > > > > > > cause > > > > > > > >> > failed > > > > > > > >> > > > > > produce requests if the retries are > > > > > > > >> > > > > > exhausted. > > > > > > > >> > > > > > > > > > > > > >> > > > > > Another two types of controller requests are > > > > > UpdateMetadata > > > > > > > and > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as > follows: > > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing > > means > > > > > > clients > > > > > > > >> > > receiving > > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership > info > > > > > > > >> > > > > > for certain partitions, and the effect is more > > retries > > > > or > > > > > > even > > > > > > > >> > fatal > > > > > > > >> > > > > > failure if the retries are exhausted. > > > > > > > >> > > > > > > > > > > > > >> > > > > > For StopReplica requests, a long queuing time may > > > > degrade > > > > > > the > > > > > > > >> > > > performance > > > > > > > >> > > > > > of topic deletion. > > > > > > > >> > > > > > > > > > > > > >> > > > > > Regarding your last question of the delay for > > > > > > > >> > DescribeLogDirsRequest, > > > > > > > >> > > > you > > > > > > > >> > > > > > are right > > > > > > > >> > > > > > that this KIP cannot help with the latency in > > getting > > > > the > > > > > > log > > > > > > > >> dirs > > > > > > > >> > > > info, > > > > > > > >> > > > > > and it's only relevant > > > > > > > >> > > > > > when controller requests are involved. > > > > > > > >> > > > > > > > > > > > > >> > > > > > Regards, > > > > > > > >> > > > > > Lucas > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin < > > > > > > lindon...@gmail.com > > > > > > > > > > > > > > > >> > > wrote: > > > > > > > >> > > > > > > > > > > > > >> > > > > >> Hey Jun, > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good point. > So > > > the > > > > > > > feature > > > > > > > >> may > > > > > > > >> > > be > > > > > > > >> > > > > >> useful for JBOD use-case. I have one question > > below. > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> Hey Lucas, > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> Do you think this feature is also useful for > > non-JBOD > > > > > setup > > > > > > > or > > > > > > > >> it > > > > > > > >> > is > > > > > > > >> > > > > only > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to > > > > understand > > > > > > > this. > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in order to > > move > > > > > > leaders > > > > > > > >> on > > > > > > > >> > the > > > > > > > >> > > > > >> failed > > > > > > > >> > > > > >> disk to other disks, the system operator first > > needs > > > to > > > > > get > > > > > > > the > > > > > > > >> > list > > > > > > > >> > > > of > > > > > > > >> > > > > >> partitions on the failed disk. This is currently > > > > achieved > > > > > > > using > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends > > > > > > > >> DescribeLogDirsRequest > > > > > > > >> > to > > > > > > > >> > > > the > > > > > > > >> > > > > >> broker. If we only prioritize the controller > > > requests, > > > > > then > > > > > > > the > > > > > > > >> > > > > >> DescribeLogDirsRequest > > > > > > > >> > > > > >> may still take a long time to be processed by the > > > > broker. > > > > > > So > > > > > > > >> the > > > > > > > >> > > > overall > > > > > > > >> > > > > >> time to move leaders away from the failed disk > may > > > > still > > > > > be > > > > > > > >> long > > > > > > > >> > > even > > > > > > > >> > > > > with > > > > > > > >> > > > > >> this KIP. What do you think? > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> Thanks, > > > > > > > >> > > > > >> Dong > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang < > > > > > > > >> lucasatu...@gmail.com > > > > > > > >> > > > > > > > > > >> > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun. > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > @Dong, > > > > > > > >> > > > > >> > Since both of the two comments in your previous > > > email > > > > > are > > > > > > > >> about > > > > > > > >> > > the > > > > > > > >> > > > > >> > benefits of this KIP and whether it's useful, > > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree > that > > > > this > > > > > > KIP > > > > > > > >> can > > > > > > > >> > be > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun? > > > > > > > >> > > > > >> > Please let me know, thanks! > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > Regards, > > > > > > > >> > > > > >> > Lucas > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao < > > > > > > j...@confluent.io> > > > > > > > >> > wrote: > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > Hi, Lucas, Dong, > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one > probably > > > > > should > > > > > > > just > > > > > > > >> > kill > > > > > > > >> > > > the > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not help. > If > > > > only > > > > > > one > > > > > > > of > > > > > > > >> > the > > > > > > > >> > > > > disks > > > > > > > >> > > > > >> on > > > > > > > >> > > > > >> > a > > > > > > > >> > > > > >> > > broker is slow, one may want to fail that > disk > > > and > > > > > move > > > > > > > the > > > > > > > >> > > > leaders > > > > > > > >> > > > > on > > > > > > > >> > > > > >> > that > > > > > > > >> > > > > >> > > disk to other brokers. In that case, being > able > > > to > > > > > > > process > > > > > > > >> the > > > > > > > >> > > > > >> > LeaderAndIsr > > > > > > > >> > > > > >> > > requests faster will potentially help the > > > producers > > > > > > > recover > > > > > > > >> > > > quicker. > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > Thanks, > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > Jun > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin < > > > > > > > >> lindon...@gmail.com > > > > > > > >> > > > > > > > > > >> > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > Hey Lucas, > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up > > questions > > > > > below. > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers > 20 > > > > > > > partitions > > > > > > > >> > that > > > > > > > >> > > > are > > > > > > > >> > > > > >> > > randomly > > > > > > > >> > > > > >> > > > distributed across all partitions, then > each > > > > > > > >> ProduceRequest > > > > > > > >> > > will > > > > > > > >> > > > > >> likely > > > > > > > >> > > > > >> > > > cover some partitions for which the broker > is > > > > still > > > > > > > >> leader > > > > > > > >> > > after > > > > > > > >> > > > > it > > > > > > > >> > > > > >> > > quickly > > > > > > > >> > > > > >> > > > processes the > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still > > be > > > > slow > > > > > > in > > > > > > > >> > > > processing > > > > > > > >> > > > > >> these > > > > > > > >> > > > > >> > > > ProduceRequest and request will still be > very > > > > high > > > > > > with > > > > > > > >> this > > > > > > > >> > > > KIP. > > > > > > > >> > > > > It > > > > > > > >> > > > > >> > > seems > > > > > > > >> > > > > >> > > > that most ProduceRequest will still timeout > > > after > > > > > 30 > > > > > > > >> > seconds. > > > > > > > >> > > Is > > > > > > > >> > > > > >> this > > > > > > > >> > > > > >> > > > understanding correct? > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will > > still > > > > > > timeout > > > > > > > >> after > > > > > > > >> > > 30 > > > > > > > >> > > > > >> > seconds, > > > > > > > >> > > > > >> > > > then it is less clear how this KIP reduces > > > > average > > > > > > > >> produce > > > > > > > >> > > > > latency. > > > > > > > >> > > > > >> Can > > > > > > > >> > > > > >> > > you > > > > > > > >> > > > > >> > > > clarify what metrics can be improved by > this > > > KIP? > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > Not sure why system operator directly cares > > > > number > > > > > of > > > > > > > >> > > truncated > > > > > > > >> > > > > >> > messages. > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average > > > > throughput > > > > > > or > > > > > > > >> > reduce > > > > > > > >> > > > > >> message > > > > > > > >> > > > > >> > > > duplication? It will be good to understand > > > this. > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > Thanks, > > > > > > > >> > > > > >> > > > Dong > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang < > > > > > > > >> > > lucasatu...@gmail.com > > > > > > > >> > > > > > > > > > > > >> > > > > >> > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > Hi Dong, > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please > > see > > > > my > > > > > > > reply > > > > > > > >> > > below. > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 > partition. > > > Now > > > > > > let's > > > > > > > >> > > consider > > > > > > > >> > > > a > > > > > > > >> > > > > >> more > > > > > > > >> > > > > >> > > > common > > > > > > > >> > > > > >> > > > > scenario > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many > > > partitions. > > > > > And > > > > > > > >> let's > > > > > > > >> > > say > > > > > > > >> > > > > for > > > > > > > >> > > > > >> > some > > > > > > > >> > > > > >> > > > > reason its IO becomes slow. > > > > > > > >> > > > > >> > > > > The number of leader partitions on > broker0 > > is > > > > so > > > > > > > large, > > > > > > > >> > say > > > > > > > >> > > > 10K, > > > > > > > >> > > > > >> that > > > > > > > >> > > > > >> > > the > > > > > > > >> > > > > >> > > > > cluster is skewed, > > > > > > > >> > > > > >> > > > > and the operator would like to shift the > > > > > leadership > > > > > > > >> for a > > > > > > > >> > > lot > > > > > > > >> > > > of > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers, > > > > > > > >> > > > > >> > > > > either manually or through some service > > like > > > > > cruise > > > > > > > >> > control. > > > > > > > >> > > > > >> > > > > With this KIP, not only will the > leadership > > > > > > > transitions > > > > > > > >> > > finish > > > > > > > >> > > > > >> more > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself > > becoming > > > > more > > > > > > > >> > balanced, > > > > > > > >> > > > > >> > > > > but all existing producers corresponding > to > > > the > > > > > 9K > > > > > > > >> > > partitions > > > > > > > >> > > > > will > > > > > > > >> > > > > >> > get > > > > > > > >> > > > > >> > > > the > > > > > > > >> > > > > >> > > > > errors relatively quickly > > > > > > > >> > > > > >> > > > > rather than relying on their timeout, > > thanks > > > to > > > > > the > > > > > > > >> > batched > > > > > > > >> > > > > async > > > > > > > >> > > > > >> ZK > > > > > > > >> > > > > >> > > > > operations. > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have > during > > > such > > > > > > > >> > troublesome > > > > > > > >> > > > > times. > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have > > > shown > > > > > > that > > > > > > > >> with > > > > > > > >> > > this > > > > > > > >> > > > > KIP > > > > > > > >> > > > > >> > many > > > > > > > >> > > > > >> > > > > producers > > > > > > > >> > > > > >> > > > > receive an explicit error > > > > NotLeaderForPartition, > > > > > > > based > > > > > > > >> on > > > > > > > >> > > > which > > > > > > > >> > > > > >> they > > > > > > > >> > > > > >> > > > retry > > > > > > > >> > > > > >> > > > > immediately. > > > > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick > > > retry) > > > > > for > > > > > > > >> their > > > > > > > >> > > > single > > > > > > > >> > > > > >> > > message > > > > > > > >> > > > > >> > > > is > > > > > > > >> > > > > >> > > > > much smaller > > > > > > > >> > > > > >> > > > > compared with the case of timing out > > without > > > > the > > > > > > KIP > > > > > > > >> (30 > > > > > > > >> > > > seconds > > > > > > > >> > > > > >> for > > > > > > > >> > > > > >> > > > timing > > > > > > > >> > > > > >> > > > > out + quick retry). > > > > > > > >> > > > > >> > > > > One might argue that reducing the timing > > out > > > on > > > > > the > > > > > > > >> > producer > > > > > > > >> > > > > side > > > > > > > >> > > > > >> can > > > > > > > >> > > > > >> > > > > achieve the same result, > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own > > > > > drawbacks[1]. > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the > > > > number > > > > > of > > > > > > > >> > > truncated > > > > > > > >> > > > > >> > messages > > > > > > > >> > > > > >> > > on > > > > > > > >> > > > > >> > > > > brokers, > > > > > > > >> > > > > >> > > > > with the experiments done in the Google > > Doc, > > > it > > > > > > > should > > > > > > > >> be > > > > > > > >> > > easy > > > > > > > >> > > > > to > > > > > > > >> > > > > >> see > > > > > > > >> > > > > >> > > > that > > > > > > > >> > > > > >> > > > > a lot fewer messages need > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the > > > up-to-date > > > > > > > >> metadata > > > > > > > >> > > > avoids > > > > > > > >> > > > > >> > > appending > > > > > > > >> > > > > >> > > > > of messages > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we > talk > > > to a > > > > > > > system > > > > > > > >> > > > operator > > > > > > > >> > > > > >> and > > > > > > > >> > > > > >> > ask > > > > > > > >> > > > > >> > > > > whether > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet > most > > > > likely > > > > > > the > > > > > > > >> > answer > > > > > > > >> > > > is > > > > > > > >> > > > > >> yes. > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it > > might > > > be > > > > > > > >> helpful to > > > > > > > >> > > > > >> construct > > > > > > > >> > > > > >> > > some > > > > > > > >> > > > > >> > > > > formulas. > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back > to > > > the > > > > > > case > > > > > > > >> where > > > > > > > >> > > > there > > > > > > > >> > > > > >> is > > > > > > > >> > > > > >> > > only > > > > > > > >> > > > > >> > > > > ONE partition involved. > > > > > > > >> > > > > >> > > > > Following the experiments in the Google > > Doc, > > > > > let's > > > > > > > say > > > > > > > >> > > broker0 > > > > > > > >> > > > > >> > becomes > > > > > > > >> > > > > >> > > > the > > > > > > > >> > > > > >> > > > > follower at time t0, > > > > > > > >> > > > > >> > > > > and after t0 there were still N produce > > > > requests > > > > > in > > > > > > > its > > > > > > > >> > > > request > > > > > > > >> > > > > >> > queue. > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought by > > this > > > > KIP, > > > > > > > >> broker0 > > > > > > > >> > > can > > > > > > > >> > > > > >> reply > > > > > > > >> > > > > >> > > with > > > > > > > >> > > > > >> > > > an > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception, > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average > > processing > > > > > time > > > > > > of > > > > > > > >> > > replying > > > > > > > >> > > > > >> with > > > > > > > >> > > > > >> > > such > > > > > > > >> > > > > >> > > > an > > > > > > > >> > > > > >> > > > > error message. > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will need to > > > > append > > > > > > > >> messages > > > > > > > >> > to > > > > > > > >> > > > > >> > segments, > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk, > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average > > processing > > > > > time > > > > > > > for > > > > > > > >> > such > > > > > > > >> > > > > logic. > > > > > > > >> > > > > >> > > > > Then the average extra latency incurred > > > without > > > > > > this > > > > > > > >> KIP > > > > > > > >> > is > > > > > > > >> > > N > > > > > > > >> > > > * > > > > > > > >> > > > > >> (M2 - > > > > > > > >> > > > > >> > > > M1) / > > > > > > > >> > > > > >> > > > > 2. > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > In practice, M2 should always be larger > > than > > > > M1, > > > > > > > which > > > > > > > >> > means > > > > > > > >> > > > as > > > > > > > >> > > > > >> long > > > > > > > >> > > > > >> > > as N > > > > > > > >> > > > > >> > > > > is positive, > > > > > > > >> > > > > >> > > > > we would see improvements on the average > > > > latency. > > > > > > > >> > > > > >> > > > > There does not need to be significant > > backlog > > > > of > > > > > > > >> requests > > > > > > > >> > in > > > > > > > >> > > > the > > > > > > > >> > > > > >> > > request > > > > > > > >> > > > > >> > > > > queue, > > > > > > > >> > > > > >> > > > > or severe degradation of disk performance > > to > > > > have > > > > > > the > > > > > > > >> > > > > improvement. > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > Regards, > > > > > > > >> > > > > >> > > > > Lucas > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on > > the > > > > > > > producer > > > > > > > >> > side > > > > > > > >> > > > can > > > > > > > >> > > > > >> > trigger > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests > > > > > > > >> > > > > >> > > > > when the corresponding leader broker is > > > > > overloaded, > > > > > > > >> > > > exacerbating > > > > > > > >> > > > > >> the > > > > > > > >> > > > > >> > > > > situation. > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin > < > > > > > > > >> > > lindon...@gmail.com > > > > > > > >> > > > > > > > > > > > >> > > > > >> > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > Hey Lucas, > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed > > documentation > > > of > > > > > the > > > > > > > >> > > > experiment. > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > Initially I also think having a > separate > > > > queue > > > > > > for > > > > > > > >> > > > controller > > > > > > > >> > > > > >> > > requests > > > > > > > >> > > > > >> > > > is > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in the > > > > summary > > > > > > > >> section > > > > > > > >> > of > > > > > > > >> > > > the > > > > > > > >> > > > > >> > Google > > > > > > > >> > > > > >> > > > > doc, > > > > > > > >> > > > > >> > > > > > controller requests are generally more > > > > > important > > > > > > > than > > > > > > > >> > data > > > > > > > >> > > > > >> requests > > > > > > > >> > > > > >> > > and > > > > > > > >> > > > > >> > > > > we > > > > > > > >> > > > > >> > > > > > probably want controller requests to be > > > > > processed > > > > > > > >> > sooner. > > > > > > > >> > > > But > > > > > > > >> > > > > >> then > > > > > > > >> > > > > >> > > Eno > > > > > > > >> > > > > >> > > > > has > > > > > > > >> > > > > >> > > > > > two very good questions which I am not > > sure > > > > the > > > > > > > >> Google > > > > > > > >> > doc > > > > > > > >> > > > has > > > > > > > >> > > > > >> > > answered > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the > > > following > > > > > > > >> questions? > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the > > actual > > > > > > benefit > > > > > > > of > > > > > > > >> > > > KIP-291 > > > > > > > >> > > > > to > > > > > > > >> > > > > >> > > users. > > > > > > > >> > > > > >> > > > > The > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc > > > simulates > > > > > the > > > > > > > >> > scenario > > > > > > > >> > > > that > > > > > > > >> > > > > >> > broker > > > > > > > >> > > > > >> > > > is > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due > to > > > e.g. > > > > > > slow > > > > > > > >> disk. > > > > > > > >> > > It > > > > > > > >> > > > > >> > currently > > > > > > > >> > > > > >> > > > > > assumes that there is only 1 partition. > > But > > > > in > > > > > > the > > > > > > > >> > common > > > > > > > >> > > > > >> scenario, > > > > > > > >> > > > > >> > > it > > > > > > > >> > > > > >> > > > is > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that > there > > > are > > > > > many > > > > > > > >> other > > > > > > > >> > > > > >> partitions > > > > > > > >> > > > > >> > > that > > > > > > > >> > > > > >> > > > > are > > > > > > > >> > > > > >> > > > > > also actively produced to and > > > ProduceRequest > > > > to > > > > > > > these > > > > > > > >> > > > > partition > > > > > > > >> > > > > >> > also > > > > > > > >> > > > > >> > > > > takes > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even > > if > > > > > > broker0 > > > > > > > >> can > > > > > > > >> > > > become > > > > > > > >> > > > > >> > > follower > > > > > > > >> > > > > >> > > > > for > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still > > > needs > > > > > to > > > > > > > >> process > > > > > > > >> > > the > > > > > > > >> > > > > >> > > > > ProduceRequest > > > > > > > >> > > > > >> > > > > > slowly t in the queue because these > > > > > > ProduceRequests > > > > > > > >> > cover > > > > > > > >> > > > > other > > > > > > > >> > > > > >> > > > > partitions. > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still > > timeout > > > > > after > > > > > > > 30 > > > > > > > >> > > seconds > > > > > > > >> > > > > and > > > > > > > >> > > > > >> > most > > > > > > > >> > > > > >> > > > > > clients will still likely timeout after > > 30 > > > > > > seconds. > > > > > > > >> Then > > > > > > > >> > > it > > > > > > > >> > > > is > > > > > > > >> > > > > >> not > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to client > > > since > > > > > > > client > > > > > > > >> > will > > > > > > > >> > > > > >> timeout > > > > > > > >> > > > > >> > > after > > > > > > > >> > > > > >> > > > > 30 > > > > > > > >> > > > > >> > > > > > seconds before possibly re-connecting > to > > > > > broker1, > > > > > > > >> with > > > > > > > >> > or > > > > > > > >> > > > > >> without > > > > > > > >> > > > > >> > > > > KIP-291. > > > > > > > >> > > > > >> > > > > > Did I miss something here? > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the > > specific > > > > > > > benefits > > > > > > > >> of > > > > > > > >> > > this > > > > > > > >> > > > > >> KIP to > > > > > > > >> > > > > >> > > > user > > > > > > > >> > > > > >> > > > > or > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether this > > KIP > > > > > > > decreases > > > > > > > >> > > > average > > > > > > > >> > > > > >> > > latency, > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably of > > > > exception > > > > > > > >> exposed > > > > > > > >> > to > > > > > > > >> > > > > >> client > > > > > > > >> > > > > >> > > etc. > > > > > > > >> > > > > >> > > > It > > > > > > > >> > > > > >> > > > > > is probably useful to clarify this. > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user > > > experience > > > > > > only > > > > > > > >> when > > > > > > > >> > > > there > > > > > > > >> > > > > is > > > > > > > >> > > > > >> > > issue > > > > > > > >> > > > > >> > > > > with > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the > > > > request > > > > > > > queue > > > > > > > >> > due > > > > > > > >> > > to > > > > > > > >> > > > > >> slow > > > > > > > >> > > > > >> > > disk > > > > > > > >> > > > > >> > > > as > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is this > > KIP > > > > > also > > > > > > > >> useful > > > > > > > >> > > when > > > > > > > >> > > > > >> there > > > > > > > >> > > > > >> > is > > > > > > > >> > > > > >> > > > no > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might > be > > > > > helpful > > > > > > > to > > > > > > > >> > > clarify > > > > > > > >> > > > > >> this > > > > > > > >> > > > > >> > to > > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP. > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > Thanks much, > > > > > > > >> > > > > >> > > > > > Dong > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas > > > Wang < > > > > > > > >> > > > > >> lucasatu...@gmail.com > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > Hi Eno, > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the > > > > experiment > > > > > > > >> results. > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive impact > > > > > achieved > > > > > > by > > > > > > > >> > > > > implementing > > > > > > > >> > > > > >> > the > > > > > > > >> > > > > >> > > > > > proposed > > > > > > > >> > > > > >> > > > > > > change: > > > > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/ > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW > > > > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing > > > > > > > >> > > > > >> > > > > > > Please take a look when you have time > > and > > > > let > > > > > > me > > > > > > > >> know > > > > > > > >> > > your > > > > > > > >> > > > > >> > > feedback. > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > Regards, > > > > > > > >> > > > > >> > > > > > > Lucas > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, > > Harsha < > > > > > > > >> > > ka...@harsha.io> > > > > > > > >> > > > > >> wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a > > > look > > > > > > might > > > > > > > >> suit > > > > > > > >> > > our > > > > > > > >> > > > > >> > > > requirements > > > > > > > >> > > > > >> > > > > > > > better. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > Thanks, > > > > > > > >> > > > > >> > > > > > > > Harsha > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, > > > Lucas > > > > > > Wang < > > > > > > > >> > > > > >> > > > lucasatu...@gmail.com > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha, > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the > > > > > replication > > > > > > > >> quota > > > > > > > >> > > > > mechanism > > > > > > > >> > > > > >> > > > proposed > > > > > > > >> > > > > >> > > > > > in > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that > > > scenario. > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out? > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Thanks, > > > > > > > >> > > > > >> > > > > > > > > Lucas > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, > > > > Harsha < > > > > > > > >> > > > > ka...@harsha.io > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > Hi Lucas, > > > > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts > > on > > > > > making > > > > > > > >> this > > > > > > > >> > > > > >> configurable > > > > > > > >> > > > > >> > > > > > > > > > and also allowing subset of > data > > > > > requests > > > > > > > to > > > > > > > >> be > > > > > > > >> > > > > >> > prioritized. > > > > > > > >> > > > > >> > > > For > > > > > > > >> > > > > >> > > > > > > > example > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when > we > > > > take > > > > > > out > > > > > > > a > > > > > > > >> > > broker > > > > > > > >> > > > > and > > > > > > > >> > > > > >> > bring > > > > > > > >> > > > > >> > > > new > > > > > > > >> > > > > >> > > > > > one > > > > > > > >> > > > > >> > > > > > > > it > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > will try to become follower and > > > have > > > > > lot > > > > > > of > > > > > > > >> > fetch > > > > > > > >> > > > > >> requests > > > > > > > >> > > > > >> > to > > > > > > > >> > > > > >> > > > > other > > > > > > > >> > > > > >> > > > > > > > > leaders > > > > > > > >> > > > > >> > > > > > > > > > in clusters. This will > negatively > > > > > effect > > > > > > > the > > > > > > > >> > > > > >> > > application/client > > > > > > > >> > > > > >> > > > > > > > > requests. > > > > > > > >> > > > > >> > > > > > > > > > We are also exploring the > similar > > > > > > solution > > > > > > > to > > > > > > > >> > > > > >> de-prioritize > > > > > > > >> > > > > >> > > if > > > > > > > >> > > > > >> > > > a > > > > > > > >> > > > > >> > > > > > new > > > > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch > > > requests, > > > > we > > > > > > are > > > > > > > >> ok > > > > > > > >> > > with > > > > > > > >> > > > > the > > > > > > > >> > > > > >> > > replica > > > > > > > >> > > > > >> > > > > to > > > > > > > >> > > > > >> > > > > > be > > > > > > > >> > > > > >> > > > > > > > > > taking time but the leaders > > should > > > > > > > prioritize > > > > > > > >> > the > > > > > > > >> > > > > client > > > > > > > >> > > > > >> > > > > requests. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > Thanks, > > > > > > > >> > > > > >> > > > > > > > > > Harsha > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 > > AM > > > > > Lucas > > > > > > > Wang > > > > > > > >> > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Hi Eno, > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed > response. > > > > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the > > > feature > > > > > > yet, > > > > > > > >> so no > > > > > > > >> > > > > >> > experimental > > > > > > > >> > > > > >> > > > > > results > > > > > > > >> > > > > >> > > > > > > > so > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > far. > > > > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in > > the > > > > > > > following > > > > > > > >> > days. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right > that > > > the > > > > > > > >> priority > > > > > > > >> > > queue > > > > > > > >> > > > > >> does > > > > > > > >> > > > > >> > not > > > > > > > >> > > > > >> > > > > > > > completely > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > prevent > > > > > > > >> > > > > >> > > > > > > > > > > data requests being processed > > > ahead > > > > > of > > > > > > > >> > > controller > > > > > > > >> > > > > >> > requests. > > > > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it > to > > > > > greatly > > > > > > > >> > mitigate > > > > > > > >> > > > the > > > > > > > >> > > > > >> > effect > > > > > > > >> > > > > >> > > > of > > > > > > > >> > > > > >> > > > > > > stable > > > > > > > >> > > > > >> > > > > > > > > > > metadata. > > > > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out > > and > > > > post > > > > > > the > > > > > > > >> > > results > > > > > > > >> > > > > >> when I > > > > > > > >> > > > > >> > > have > > > > > > > >> > > > > >> > > > > it. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Regards, > > > > > > > >> > > > > >> > > > > > > > > > > Lucas > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 > > AM, > > > > Eno > > > > > > > >> Thereska > > > > > > > >> > < > > > > > > > >> > > > > >> > > > > > > > eno.there...@gmail.com > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas, > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just > > had a > > > > > look > > > > > > at > > > > > > > >> > this. > > > > > > > >> > > A > > > > > > > >> > > > > >> couple > > > > > > > >> > > > > >> > of > > > > > > > >> > > > > >> > > > > > > > questions: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any > positive > > > > > change > > > > > > > >> after > > > > > > > >> > > > > >> implementing > > > > > > > >> > > > > >> > > > this > > > > > > > >> > > > > >> > > > > > KIP? > > > > > > > >> > > > > >> > > > > > > > > I'm > > > > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any > > > > > > experimental > > > > > > > >> > results > > > > > > > >> > > > > that > > > > > > > >> > > > > >> > show > > > > > > > >> > > > > >> > > > the > > > > > > > >> > > > > >> > > > > > > > benefit > > > > > > > >> > > > > >> > > > > > > > > of > > > > > > > >> > > > > >> > > > > > > > > > > the > > > > > > > >> > > > > >> > > > > > > > > > > > two queues. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not > > > > > sufficient > > > > > > in > > > > > > > >> > > > addressing > > > > > > > >> > > > > >> the > > > > > > > >> > > > > >> > > > > problem > > > > > > > >> > > > > >> > > > > > > the > > > > > > > >> > > > > >> > > > > > > > > KIP > > > > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with > > priority > > > > > > queues, > > > > > > > >> you > > > > > > > >> > > will > > > > > > > >> > > > > >> > sometimes > > > > > > > >> > > > > >> > > > > > > (often?) > > > > > > > >> > > > > >> > > > > > > > > have > > > > > > > >> > > > > >> > > > > > > > > > > the > > > > > > > >> > > > > >> > > > > > > > > > > > case that data plane > requests > > > > will > > > > > be > > > > > > > >> ahead > > > > > > > >> > of > > > > > > > >> > > > the > > > > > > > >> > > > > >> > > control > > > > > > > >> > > > > >> > > > > > plane > > > > > > > >> > > > > >> > > > > > > > > > > requests. > > > > > > > >> > > > > >> > > > > > > > > > > > This happens because the > > system > > > > > might > > > > > > > >> have > > > > > > > >> > > > already > > > > > > > >> > > > > >> > > started > > > > > > > >> > > > > >> > > > > > > > > processing > > > > > > > >> > > > > >> > > > > > > > > > > the > > > > > > > >> > > > > >> > > > > > > > > > > > data plane requests before > > the > > > > > > control > > > > > > > >> plane > > > > > > > >> > > > ones > > > > > > > >> > > > > >> > > arrived. > > > > > > > >> > > > > >> > > > So > > > > > > > >> > > > > >> > > > > > it > > > > > > > >> > > > > >> > > > > > > > > would > > > > > > > >> > > > > >> > > > > > > > > > > be > > > > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the > > > > problem > > > > > > this > > > > > > > >> KIP > > > > > > > >> > > > > >> addresses. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Thanks > > > > > > > >> > > > > >> > > > > > > > > > > > Eno > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at > 4:44 > > > PM, > > > > > Ted > > > > > > > Yu < > > > > > > > >> > > > > >> > > > > yuzhih...@gmail.com > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > Thanks > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at > > 8:42 > > > > AM, > > > > > > > Lucas > > > > > > > >> > Wang > > > > > > > >> > > < > > > > > > > >> > > > > >> > > > > > > > lucasatu...@gmail.com > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted, > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the > > suggestion. > > > > I've > > > > > > > >> updated > > > > > > > >> > > the > > > > > > > >> > > > > KIP. > > > > > > > >> > > > > >> > > Please > > > > > > > >> > > > > >> > > > > > take > > > > > > > >> > > > > >> > > > > > > > > > another > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > look. > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at > > > 6:34 > > > > > PM, > > > > > > > Ted > > > > > > > >> Yu > > > > > > > >> > < > > > > > > > >> > > > > >> > > > > > > yuzhih...@gmail.com > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > wrote: > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in > > > > > KafkaConfig.scala > > > > > > : > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > val > QueuedMaxRequests = > > > 500 > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if > you > > > can > > > > > > > include > > > > > > > >> > the > > > > > > > >> > > > > >> default > > > > > > > >> > > > > >> > > value > > > > > > > >> > > > > >> > > > > for > > > > > > > >> > > > > >> > > > > > > > this > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > >