Hi Lucas, I agree, if we want to go forward with a separate controller plane and data plane and completely isolate them, having a separate port for controller with a separate Acceptor and a Processor sounds ideal to me.
Thanks, Mayuresh On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <becket....@gmail.com> wrote: > Hi Lucas, > > Yes, I agree that a dedicated end to end control flow would be ideal. > > Thanks, > > Jiangjie (Becket) Qin > > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <lucasatu...@gmail.com> wrote: > > > Thanks for the comment, Becket. > > So far, we've been trying to avoid making any request handler thread > > special. > > But if we were to follow that path in order to make the two planes more > > isolated, > > what do you think about also having a dedicated processor thread, > > and dedicated port for the controller? > > > > Today one processor thread can handle multiple connections, let's say 100 > > connections > > > > represented by connection0, ... connection99, among which connection0-98 > > are from clients, while connection99 is from > > > > the controller. Further let's say after one selector polling, there are > > incoming requests on all connections. > > > > When the request queue is full, (either the data request being full in > the > > two queue design, or > > > > the one single queue being full in the deque design), the processor > thread > > will be blocked first > > > > when trying to enqueue the data request from connection0, then possibly > > blocked for the data request > > > > from connection1, ... etc even though the controller request is ready to > be > > enqueued. > > > > To solve this problem, it seems we would need to have a separate port > > dedicated to > > > > the controller, a dedicated processor thread, a dedicated controller > > request queue, > > > > and pinning of one request handler thread for controller requests. > > > > Thanks, > > Lucas > > > > > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <becket....@gmail.com> > wrote: > > > > > Personally I am not fond of the dequeue approach simply because it is > > > against the basic idea of isolating the controller plane and data > plane. > > > With a single dequeue, theoretically speaking the controller requests > can > > > starve the clients requests. I would prefer the approach with a > separate > > > controller request queue and a dedicated controller request handler > > thread. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lucasatu...@gmail.com> > > wrote: > > > > > > > Sure, I can summarize the usage of correlation id. But before I do > > that, > > > it > > > > seems > > > > the same out-of-order processing can also happen to Produce requests > > sent > > > > by producers, > > > > following the same example you described earlier. > > > > If that's the case, I think this probably deserves a separate doc and > > > > design independent of this KIP. > > > > > > > > Lucas > > > > > > > > > > > > > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <lindon...@gmail.com> > > wrote: > > > > > > > > > Hey Lucas, > > > > > > > > > > Could you update the KIP if you are confident with the approach > which > > > > uses > > > > > correlation id? The idea around correlation id is kind of scattered > > > > across > > > > > multiple emails. It will be useful if other reviews can read the > KIP > > to > > > > > understand the latest proposal. > > > > > > > > > > Thanks, > > > > > Dong > > > > > > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat < > > > > > gharatmayures...@gmail.com> wrote: > > > > > > > > > > > I like the idea of the dequeue implementation by Lucas. This will > > > help > > > > us > > > > > > avoid additional queue for controller and additional configs in > > > Kafka. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Mayuresh > > > > > > > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <becket....@gmail.com > > > > > > wrote: > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > The usage of correlation ID might still be useful to address > the > > > > cases > > > > > > > that the controller epoch and leader epoch check are not > > sufficient > > > > to > > > > > > > guarantee correct behavior. For example, if the controller > sends > > a > > > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the > > > broker > > > > > > > processes it in the reverse order, the replica may still be > > wrongly > > > > > > > recreated, right? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > > Hmm, since we already use controller epoch and leader epoch > for > > > > > > properly > > > > > > > > caching the latest partition state, do we really need > > correlation > > > > id > > > > > > for > > > > > > > > ordering the controller requests? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin < > > > becket....@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > >> Lucas and Mayuresh, > > > > > > > >> > > > > > > > >> Good idea. The correlation id should work. > > > > > > > >> > > > > > > > >> In the ControllerChannelManager, a request will be resent > > until > > > a > > > > > > > response > > > > > > > >> is received. So if the controller to broker connection > > > disconnects > > > > > > after > > > > > > > >> controller sends R1_a, but before the response of R1_a is > > > > received, > > > > > a > > > > > > > >> disconnection may cause the controller to resend R1_b. i.e. > > > until > > > > R1 > > > > > > is > > > > > > > >> acked, R2 won't be sent by the controller. > > > > > > > >> This gives two guarantees: > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2. > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must have been > > > > processed > > > > > at > > > > > > > >> least once. > > > > > > > >> > > > > > > > >> So on the broker side, with a single thread controller > request > > > > > > handler, > > > > > > > the > > > > > > > >> logic should be: > > > > > > > >> 1. Process what ever request seen in the controller request > > > queue > > > > > > > >> 2. For the given epoch, drop request if its correlation id > is > > > > > smaller > > > > > > > than > > > > > > > >> that of the last processed request. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> > > > > > > > >> Jiangjie (Becket) Qin > > > > > > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <j...@confluent.io> > > > > wrote: > > > > > > > >> > > > > > > > >>> I agree that there is no strong ordering when there are > more > > > than > > > > > one > > > > > > > >>> socket connections. Currently, we rely on controllerEpoch > and > > > > > > > leaderEpoch > > > > > > > >>> to ensure that the receiving broker picks up the latest > state > > > for > > > > > > each > > > > > > > >>> partition. > > > > > > > >>> > > > > > > > >>> One potential issue with the dequeue approach is that if > the > > > > queue > > > > > is > > > > > > > >> full, > > > > > > > >>> there is no guarantee that the controller requests will be > > > > enqueued > > > > > > > >>> quickly. > > > > > > > >>> > > > > > > > >>> Thanks, > > > > > > > >>> > > > > > > > >>> Jun > > > > > > > >>> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat < > > > > > > > >>> gharatmayures...@gmail.com > > > > > > > >>>> wrote: > > > > > > > >>> > > > > > > > >>>> Yea, the correlationId is only set to 0 in the > NetworkClient > > > > > > > >> constructor. > > > > > > > >>>> Since we reuse the same NetworkClient between Controller > and > > > the > > > > > > > >> broker, > > > > > > > >>> a > > > > > > > >>>> disconnection should not cause it to reset to 0, in which > > case > > > > it > > > > > > can > > > > > > > >> be > > > > > > > >>>> used to reject obsolete requests. > > > > > > > >>>> > > > > > > > >>>> Thanks, > > > > > > > >>>> > > > > > > > >>>> Mayuresh > > > > > > > >>>> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang < > > > > lucasatu...@gmail.com > > > > > > > > > > > > > >>> wrote: > > > > > > > >>>> > > > > > > > >>>>> @Dong, > > > > > > > >>>>> Great example and explanation, thanks! > > > > > > > >>>>> > > > > > > > >>>>> @All > > > > > > > >>>>> Regarding the example given by Dong, it seems even if we > > use > > > a > > > > > > queue, > > > > > > > >>>> and a > > > > > > > >>>>> dedicated controller request handling thread, > > > > > > > >>>>> the same result can still happen because R1_a will be > sent > > on > > > > one > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a different > > > > connection, > > > > > > > >>>>> and there is no ordering between different connections on > > the > > > > > > broker > > > > > > > >>>> side. > > > > > > > >>>>> I was discussing with Mayuresh offline, and it seems > > > > correlation > > > > > id > > > > > > > >>>> within > > > > > > > >>>>> the same NetworkClient object is monotonically increasing > > and > > > > > never > > > > > > > >>>> reset, > > > > > > > >>>>> hence a broker can leverage that to properly reject > > obsolete > > > > > > > >> requests. > > > > > > > >>>>> Thoughts? > > > > > > > >>>>> > > > > > > > >>>>> Thanks, > > > > > > > >>>>> Lucas > > > > > > > >>>>> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat < > > > > > > > >>>>> gharatmayures...@gmail.com> wrote: > > > > > > > >>>>> > > > > > > > >>>>>> Actually nvm, correlationId is reset in case of > connection > > > > > loss, I > > > > > > > >>>> think. > > > > > > > >>>>>> > > > > > > > >>>>>> Thanks, > > > > > > > >>>>>> > > > > > > > >>>>>> Mayuresh > > > > > > > >>>>>> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat < > > > > > > > >>>>>> gharatmayures...@gmail.com> > > > > > > > >>>>>> wrote: > > > > > > > >>>>>> > > > > > > > >>>>>>> I agree with Dong that out-of-order processing can > happen > > > > with > > > > > > > >>>> having 2 > > > > > > > >>>>>>> separate queues as well and it can even happen today. > > > > > > > >>>>>>> Can we use the correlationId in the request from the > > > > controller > > > > > > > >> to > > > > > > > >>>> the > > > > > > > >>>>>>> broker to handle ordering ? > > > > > > > >>>>>>> > > > > > > > >>>>>>> Thanks, > > > > > > > >>>>>>> > > > > > > > >>>>>>> Mayuresh > > > > > > > >>>>>>> > > > > > > > >>>>>>> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin < > > > > > becket....@gmail.com > > > > > > > >>> > > > > > > > >>>>> wrote: > > > > > > > >>>>>>> > > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated controller > > > > request > > > > > > > >>>> handling > > > > > > > >>>>>>>> thread would be a better isolation. It also solves the > > > > > > > >> reordering > > > > > > > >>>>> issue. > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy < > > > > > > > >> jjkosh...@gmail.com> > > > > > > > >>>>>> wrote: > > > > > > > >>>>>>>> > > > > > > > >>>>>>>>> Good example. I think this scenario can occur in the > > > > current > > > > > > > >>> code > > > > > > > >>>> as > > > > > > > >>>>>>>> well > > > > > > > >>>>>>>>> but with even lower probability given that there are > > > other > > > > > > > >>>>>>>> non-controller > > > > > > > >>>>>>>>> requests interleaved. It is still sketchy though and > I > > > > think > > > > > a > > > > > > > >>>> safer > > > > > > > >>>>>>>>> approach would be separate queues and pinning > > controller > > > > > > > >> request > > > > > > > >>>>>>>> handling > > > > > > > >>>>>>>>> to one handler thread. > > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin < > > > > > > > >> lindon...@gmail.com > > > > > > > >>>> > > > > > > > >>>>>> wrote: > > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>>> Hey Becket, > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> I think you are right that there may be out-of-order > > > > > > > >>> processing. > > > > > > > >>>>>>>> However, > > > > > > > >>>>>>>>>> it seems that out-of-order processing may also > happen > > > even > > > > > > > >> if > > > > > > > >>> we > > > > > > > >>>>>> use a > > > > > > > >>>>>>>>>> separate queue. > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> Here is the example: > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected before > > > > receiving > > > > > > > >>>>>> response. > > > > > > > >>>>>>>>> Then > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay > in > > > the > > > > > > > >>>>> controller > > > > > > > >>>>>>>>>> request queue in the order they are sent. > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and then > > > > thread2 > > > > > > > >>>> takes > > > > > > > >>>>>> R2 > > > > > > > >>>>>>>>> from > > > > > > > >>>>>>>>>> the request queue almost at the same time. > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is > > > > chance > > > > > > > >>> that > > > > > > > >>>>>> R2's > > > > > > > >>>>>>>>>> processing is completed before R1. > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> If out-of-order processing can happen for both > > > approaches > > > > > > > >> with > > > > > > > >>>>> very > > > > > > > >>>>>>>> low > > > > > > > >>>>>>>>>> probability, it may not be worthwhile to add the > extra > > > > > > > >> queue. > > > > > > > >>>> What > > > > > > > >>>>>> do > > > > > > > >>>>>>>> you > > > > > > > >>>>>>>>>> think? > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>> Dong > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin < > > > > > > > >>>> becket....@gmail.com > > > > > > > >>>>>> > > > > > > > >>>>>>>>> wrote: > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel, > > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue was bright > up > > > some > > > > > > > >>> time > > > > > > > >>>>> ago > > > > > > > >>>>>>>> when > > > > > > > >>>>>>>>>> we > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the request. The > > > > > > > >> concern > > > > > > > >>>> was > > > > > > > >>>>>> that > > > > > > > >>>>>>>>> the > > > > > > > >>>>>>>>>>> controller requests are supposed to be processed in > > > > order. > > > > > > > >>> If > > > > > > > >>>> we > > > > > > > >>>>>> can > > > > > > > >>>>>>>>>> ensure > > > > > > > >>>>>>>>>>> that there is one controller request in the request > > > > > > > >> channel, > > > > > > > >>>> the > > > > > > > >>>>>>>> order > > > > > > > >>>>>>>>> is > > > > > > > >>>>>>>>>>> not a concern. But in cases that there are more > than > > > one > > > > > > > >>>>>> controller > > > > > > > >>>>>>>>>> request > > > > > > > >>>>>>>>>>> inserted into the queue, the controller request > order > > > may > > > > > > > >>>> change > > > > > > > >>>>>> and > > > > > > > >>>>>>>>>> cause > > > > > > > >>>>>>>>>>> problem. For example, think about the following > > > sequence: > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to > > broker > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the > head > > > of > > > > > > > >> the > > > > > > > >>>>>> request > > > > > > > >>>>>>>>>> queue. > > > > > > > >>>>>>>>>>> 3. Controller to broker connection failed and the > > > > > > > >> controller > > > > > > > >>>>>>>>> reconnected > > > > > > > >>>>>>>>>> to > > > > > > > >>>>>>>>>>> the broker. > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the > > > > > > > >> request > > > > > > > >>>>> queue. > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed before > > R1 > > > is > > > > > > > >>>>>> processed, > > > > > > > >>>>>>>>>> which > > > > > > > >>>>>>>>>>> may cause problem. > > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin > > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy < > > > > > > > >>>>> jjkosh...@gmail.com> > > > > > > > >>>>>>>>> wrote: > > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a > > > simpler > > > > > > > >>>> less > > > > > > > >>>>>>>>> invasive > > > > > > > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others, > > do > > > > > > > >> you > > > > > > > >>>> see > > > > > > > >>>>>> any > > > > > > > >>>>>>>>>>> pitfalls > > > > > > > >>>>>>>>>>>> with this approach? > > > > > > > >>>>>>>>>>>> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang < > > > > > > > >>>>>>>> lucasatu...@gmail.com> > > > > > > > >>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> @Mayuresh, > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't > > thought > > > > > > > >>>>> before. > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty > well, > > > and > > > > > > > >>>> also > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size metric and > > > capacity > > > > > > > >>>>> config > > > > > > > >>>>>>>>>>>>> for the controller request queue. In fact, if we > > were > > > > > > > >> to > > > > > > > >>>>> adopt > > > > > > > >>>>>>>>>>>>> this design, there is no public interface change, > > and > > > > > > > >> we > > > > > > > >>>>>>>>>>>>> probably don't need a KIP. > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily > > > satisfy > > > > > > > >>> the > > > > > > > >>>>>>>>>> requirement > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing > > inserting > > > > > > > >> at > > > > > > > >>>>> both > > > > > > > >>>>>>>> ends. > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> My only concern is that this design is tied to > the > > > > > > > >>>>> coincidence > > > > > > > >>>>>>>> that > > > > > > > >>>>>>>>>>>>> we have two request priorities and there are two > > ends > > > > > > > >>> to a > > > > > > > >>>>>>>> deque. > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems the > > > > > > > >> network > > > > > > > >>>>> layer > > > > > > > >>>>>>>> is > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. > > if > > > > > > > >> we > > > > > > > >>>> were > > > > > > > >>>>>> to > > > > > > > >>>>>>>> add > > > > > > > >>>>>>>>>>>>> an extra priority level in the future for some > > > reason, > > > > > > > >>> we > > > > > > > >>>>>> would > > > > > > > >>>>>>>>>>> probably > > > > > > > >>>>>>>>>>>>> need to go back to the design of separate queues, > > one > > > > > > > >>> for > > > > > > > >>>>> each > > > > > > > >>>>>>>>>> priority > > > > > > > >>>>>>>>>>>>> level. > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean > > toward > > > > > > > >>> your > > > > > > > >>>>>>>> suggested > > > > > > > >>>>>>>>>>>>> approach. > > > > > > > >>>>>>>>>>>>> Let's hear what others think. > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> @Becket, > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm > > > > > > > >>> answering > > > > > > > >>>>>> your > > > > > > > >>>>>>>>>>> question > > > > > > > >>>>>>>>>>>>> only in the context > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think your > suggestion > > > > > > > >> makes > > > > > > > >>>>>> sense, > > > > > > > >>>>>>>> and > > > > > > > >>>>>>>>>> I'm > > > > > > > >>>>>>>>>>>> ok > > > > > > > >>>>>>>>>>>>> with removing the capacity config and > > > > > > > >>>>>>>>>>>>> just relying on the default value of 20 being > > > > > > > >> sufficient > > > > > > > >>>>>> enough. > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>> Lucas > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat > < > > > > > > > >>>>>>>>>>>>> gharatmayures...@gmail.com > > > > > > > >>>>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> Hi Lucas, > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize > > the > > > > > > > >>>>>>>> controller > > > > > > > >>>>>>>>>>> request > > > > > > > >>>>>>>>>>>>>> over any other requests. > > > > > > > >>>>>>>>>>>>>> In that case, we can change the request queue > to a > > > > > > > >>>>> dequeue, > > > > > > > >>>>>>>> where > > > > > > > >>>>>>>>>> you > > > > > > > >>>>>>>>>>>>>> always insert the normal requests (produce, > > > > > > > >>>> consume,..etc) > > > > > > > >>>>>> to > > > > > > > >>>>>>>> the > > > > > > > >>>>>>>>>> end > > > > > > > >>>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request, > you > > > > > > > >>> insert > > > > > > > >>>>> it > > > > > > > >>>>>> to > > > > > > > >>>>>>>>> the > > > > > > > >>>>>>>>>>> head > > > > > > > >>>>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller > > request > > > > > > > >>> will > > > > > > > >>>>> be > > > > > > > >>>>>>>> given > > > > > > > >>>>>>>>>>>> higher > > > > > > > >>>>>>>>>>>>>> priority over other requests. > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request from the > > socket > > > > > > > >>> and > > > > > > > >>>>> mute > > > > > > > >>>>>>>> it > > > > > > > >>>>>>>>> and > > > > > > > >>>>>>>>>>>> only > > > > > > > >>>>>>>>>>>>>> unmute it after handling the request, this would > > > > > > > >>> ensure > > > > > > > >>>>> that > > > > > > > >>>>>>>> we > > > > > > > >>>>>>>>>> don't > > > > > > > >>>>>>>>>>>>>> handle controller requests out of order. > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the second queue > > and > > > > > > > >>> the > > > > > > > >>>>>>>>> additional > > > > > > > >>>>>>>>>>>>> config > > > > > > > >>>>>>>>>>>>>> for the size of the queue. > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> What do you think ? > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> Mayuresh > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin < > > > > > > > >>>>>>>> becket....@gmail.com > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>> Hey Joel, > > > > > > > >>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the > > > > > > > >>> current > > > > > > > >>>>>> design > > > > > > > >>>>>>>>>> makes > > > > > > > >>>>>>>>>>>>> sense. > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new config > for > > > > > > > >> the > > > > > > > >>>>>>>> controller > > > > > > > >>>>>>>>>>> queue > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case > > in > > > > > > > >>>> which > > > > > > > >>>>>>>> users > > > > > > > >>>>>>>>>>> would > > > > > > > >>>>>>>>>>>>>> change > > > > > > > >>>>>>>>>>>>>>> it. > > > > > > > >>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin > > > > > > > >>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin < > > > > > > > >>>>>>>>>> becket....@gmail.com> > > > > > > > >>>>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas, > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we > > > > > > > >>>> expect > > > > > > > >>>>>>>> user to > > > > > > > >>>>>>>>>>> ever > > > > > > > >>>>>>>>>>>>>> change > > > > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we > > > > > > > >>> agree > > > > > > > >>>>> that > > > > > > > >>>>>>>> 20 > > > > > > > >>>>>>>>> is > > > > > > > >>>>>>>>>>>>> already > > > > > > > >>>>>>>>>>>>>> a > > > > > > > >>>>>>>>>>>>>>>> very generous default number and we do not > > > > > > > >> expect > > > > > > > >>>> user > > > > > > > >>>>>> to > > > > > > > >>>>>>>>>> change > > > > > > > >>>>>>>>>>>> it, > > > > > > > >>>>>>>>>>>>> is > > > > > > > >>>>>>>>>>>>>>> it > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config? > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang < > > > > > > > >>>>>>>>>>> lucasatu...@gmail.com > > > > > > > >>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>> @Becket > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that > > > > > > > >>>>> normally > > > > > > > >>>>>>>> there > > > > > > > >>>>>>>>>>>> should > > > > > > > >>>>>>>>>>>>> be > > > > > > > >>>>>>>>>>>>>>>>> just > > > > > > > >>>>>>>>>>>>>>>>> one controller request because of muting, > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be > > > > > > > >> many > > > > > > > >>>>>>>> enqueued > > > > > > > >>>>>>>>>>>>> controller > > > > > > > >>>>>>>>>>>>>>>>> requests. > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not > sure > > > > > > > >>>> which > > > > > > > >>>>>> part > > > > > > > >>>>>>>>>>> conveys > > > > > > > >>>>>>>>>>>>> that > > > > > > > >>>>>>>>>>>>>>>>> info. > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out > the > > > > > > > >>>>> section. > > > > > > > >>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal > > > > > > > >>>> conditions, > > > > > > > >>>>>> the > > > > > > > >>>>>>>>>> current > > > > > > > >>>>>>>>>>>>>> design > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers > running > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the > > > > > > > >>>>> controller > > > > > > > >>>>>>>>> queue > > > > > > > >>>>>>>>>>>>> capacity > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1, > > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests from > > > > > > > >> different > > > > > > > >>>>>>>> controllers > > > > > > > >>>>>>>>>>> will > > > > > > > >>>>>>>>>>>> be > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times, > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the > > > > > > > >> other > > > > > > > >>>>> hand, > > > > > > > >>>>>>>>> adding > > > > > > > >>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>> extra > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards > us > > > > > > > >>> from > > > > > > > >>>>>>>> issues > > > > > > > >>>>>>>>> in > > > > > > > >>>>>>>>>>>> those > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much > > > > > > > >>>> downside > > > > > > > >>>>> of > > > > > > > >>>>>>>>> adding > > > > > > > >>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>> extra > > > > > > > >>>>>>>>>>>>>>>>> config. > > > > > > > >>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete > > > > > > > >>> statement > > > > > > > >>>>>> based > > > > > > > >>>>>>>> on > > > > > > > >>>>>>>>> a > > > > > > > >>>>>>>>>>>>> previous > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP. > > > > > > > >>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>>>>>> Lucas > > > > > > > >>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh > > > > > > > >>> Gharat < > > > > > > > >>>>>>>>>>>>>>>>> gharatmayures...@gmail.com> wrote: > > > > > > > >>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas, > > > > > > > >>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP. > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The > > > > > > > >>>> memory > > > > > > > >>>>>>>>>>> consumption > > > > > > > >>>>>>>>>>>>> can > > > > > > > >>>>>>>>>>>>>>> rise > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests > can > > > > > > > >>> go > > > > > > > >>>> up > > > > > > > >>>>>> to > > > > > > > >>>>>>>> 2x" > > > > > > > >>>>>>>>>> in > > > > > > > >>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>> impact > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from > > > > > > > >> controller > > > > > > > >>>> to a > > > > > > > >>>>>>>> Broker > > > > > > > >>>>>>>>>> are > > > > > > > >>>>>>>>>>>> not > > > > > > > >>>>>>>>>>>>>>> high > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ? > > > > > > > >>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh > > > > > > > >>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin < > > > > > > > >>>>>>>>>>>> becket....@gmail.com> > > > > > > > >>>>>>>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the > > > > > > > >>>> control > > > > > > > >>>>>>>> plane > > > > > > > >>>>>>>>>> from > > > > > > > >>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>> data > > > > > > > >>>>>>>>>>>>>>>>>> plane > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense. > > > > > > > >>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the > > > > > > > >> controller > > > > > > > >>>>>> request > > > > > > > >>>>>>>>> queue > > > > > > > >>>>>>>>>>> may > > > > > > > >>>>>>>>>>>>>> have > > > > > > > >>>>>>>>>>>>>>>>> many > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case? > > > > > > > >>> The > > > > > > > >>>>>>>>> controller > > > > > > > >>>>>>>>>>>>>> requests > > > > > > > >>>>>>>>>>>>>>>>> still > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The > > > > > > > >>> SocketServer > > > > > > > >>>>>> will > > > > > > > >>>>>>>>> mute > > > > > > > >>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>> channel > > > > > > > >>>>>>>>>>>>>>>>>> once > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the request > > > > > > > >>>>> channel. > > > > > > > >>>>>>>> So > > > > > > > >>>>>>>>>>>> assuming > > > > > > > >>>>>>>>>>>>>>> there > > > > > > > >>>>>>>>>>>>>>>>> is > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller and > > > > > > > >>> each > > > > > > > >>>>>>>> broker, > > > > > > > >>>>>>>>> on > > > > > > > >>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>> broker > > > > > > > >>>>>>>>>>>>>>>>>> side, > > > > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller request > > > > > > > >>> in > > > > > > > >>>>> the > > > > > > > >>>>>>>>>>> controller > > > > > > > >>>>>>>>>>>>>>> request > > > > > > > >>>>>>>>>>>>>>>>>> queue > > > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do > > > > > > > >> we > > > > > > > >>>>> need > > > > > > > >>>>>> a > > > > > > > >>>>>>>>>>> separate > > > > > > > >>>>>>>>>>>>>>>>> controller > > > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default > > > > > > > >>>> value > > > > > > > >>>>> 20 > > > > > > > >>>>>>>>> means > > > > > > > >>>>>>>>>>> that > > > > > > > >>>>>>>>>>>>> we > > > > > > > >>>>>>>>>>>>>>>>> expect > > > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen > > > > > > > >>> in a > > > > > > > >>>>>> short > > > > > > > >>>>>>>>>> period > > > > > > > >>>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>>>> time. > > > > > > > >>>>>>>>>>>>>>>>> I > > > > > > > >>>>>>>>>>>>>>>>>> am > > > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase > > > > > > > >> the > > > > > > > >>>>>>>> controller > > > > > > > >>>>>>>>>>>> request > > > > > > > >>>>>>>>>>>>>>> queue > > > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems > > > > > > > >>>>>> indicating > > > > > > > >>>>>>>>>>> something > > > > > > > >>>>>>>>>>>>>> very > > > > > > > >>>>>>>>>>>>>>>>> wrong > > > > > > > >>>>>>>>>>>>>>>>>>> has happened. > > > > > > > >>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin > > > > > > > >>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin < > > > > > > > >>>>>>>>>>>> lindon...@gmail.com> > > > > > > > >>>>>>>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas. > > > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is > > > > > > > >>> intuitive. > > > > > > > >>>> It > > > > > > > >>>>>>>> will > > > > > > > >>>>>>>>> be > > > > > > > >>>>>>>>>>> good > > > > > > > >>>>>>>>>>>>> to > > > > > > > >>>>>>>>>>>>>>>>> learn > > > > > > > >>>>>>>>>>>>>>>>>>> more > > > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers. > > > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas > > > > > > > >>> Wang < > > > > > > > >>>>>>>>>>>>>>> lucasatu...@gmail.com> > > > > > > > >>>>>>>>>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong, > > > > > > > >>>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of > > > > > > > >>> the > > > > > > > >>>>> KIP > > > > > > > >>>>>> by > > > > > > > >>>>>>>>>>>> explaining > > > > > > > >>>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>> cases > > > > > > > >>>>>>>>>>>>>>>>>>>> that > > > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts. > > > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your > > > > > > > >>>>>> comments. > > > > > > > >>>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks, > > > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas > > > > > > > >>>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas > > > > > > > >>> Wang > > > > > > > >>>> < > > > > > > > >>>>>>>>>>>>>>> lucasatu...@gmail.com > > > > > > > >>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > > > > > >>>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong, > > > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is > > > > > > > >>>> merely > > > > > > > >>>>>>>> for me > > > > > > > >>>>>>>>>> to > > > > > > > >>>>>>>>>>>>> easily > > > > > > > >>>>>>>>>>>>>>>>>>> construct > > > > > > > >>>>>>>>>>>>>>>>>>>> a > > > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario > > > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests. > > > > > > > >> In > > > > > > > >>>>>>>> production, > > > > > > > >>>>>>>>>>> other > > > > > > > >>>>>>>>>>>>>> than > > > > > > > >>>>>>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>> disk > > > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused > > > > > > > >> by > > > > > > > >>>> high > > > > > > > >>>>>>>>> produce > > > > > > > >>>>>>>>>>> QPS. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill > > > > > > > >>> the > > > > > > > >>>>>>>> broker > > > > > > > >>>>>>>>> and > > > > > > > >>>>>>>>>>>>> that's > > > > > > > >>>>>>>>>>>>>>> when > > > > > > > >>>>>>>>>>>>>>>>>> this > > > > > > > >>>>>>>>>>>>>>>>>>>> KIP > > > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question > > > > > > > >>>> about > > > > > > > >>>>>> each > > > > > > > >>>>>>>>>>>>>> ProduceRequest > > > > > > > >>>>>>>>>>>>>>>>>>> covering > > > > > > > >>>>>>>>>>>>>>>>>>>>> 20 > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly > > > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr > > > > > > > >>>>> request > > > > > > > >>>>>>>> is > > > > > > > >>>>>>>>>>>> enqueued > > > > > > > >>>>>>>>>>>>>> that > > > > > > > >>>>>>>>>>>>>>>>>> tries > > > > > > > >>>>>>>>>>>>>>>>>>> to > > > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say > > > > > > > >> broker0, > > > > > > > >>>>> from > > > > > > > >>>>>>>>> leader > > > > > > > >>>>>>>>>> to > > > > > > > >>>>>>>>>>>>>>> follower > > > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say > > > > > > > >>>> *test-0*. > > > > > > > >>>>>> For > > > > > > > >>>>>>>> the > > > > > > > >>>>>>>>>>> sake > > > > > > > >>>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>>>>>>> argument, > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers, > > > > > > > >>> say > > > > > > > >>>>>>>> broker1, > > > > > > > >>>>>>>>>> have > > > > > > > >>>>>>>>>>>>>>> *stopped* > > > > > > > >>>>>>>>>>>>>>>>>>>> fetching > > > > > > > >>>>>>>>>>>>>>>>>>>>>> from > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests > > > > > > > >>> have > > > > > > > >>>>>> acks = > > > > > > > >>>>>>>>> -1 > > > > > > > >>>>>>>>>>>> (ALL) > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1.1 without this KIP, the > > > > > > > >>>> ProduceRequests > > > > > > > >>>>>>>> ahead > > > > > > > >>>>>>>>> of > > > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR > > > > > > > >>>>>>>>>>>>>>>>>>> will > > > > > > > >>>>>>>>>>>>>>>>>>>> be > > > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory, > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and since they'll never be > > > > > > > >>>>> replicated > > > > > > > >>>>>>>> to > > > > > > > >>>>>>>>>> other > > > > > > > >>>>>>>>>>>>>> brokers > > > > > > > >>>>>>>>>>>>>>>>>>> (because > > > > > > > >>>>>>>>>>>>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will > > > > > > > >>>>>>>>>>>>>>>>>>>>>> be completed either when the > > > > > > > >>>>>>>> LeaderAndISR > > > > > > > >>>>>>>>>>>> request > > > > > > > >>>>>>>>>>>>> is > > > > > > > >>>>>>>>>>>>>>>>>>> processed > > > > > > > >>>>>>>>>>>>>>>>>>>> or > > > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1.2 With this KIP, broker0 will > > > > > > > >>>>> immediately > > > > > > > >>>>>>>>>>> transition > > > > > > > >>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>> partition > > > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower, > > > > > > > >>>>>>>>>>>>>>>>>>>>>> after the current broker sees > > > > > > > >>> the > > > > > > > >>>>>>>>>> replication > > > > > > > >>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>> remaining > > > > > > > >>>>>>>>>>>>>>>>>>>> 19 > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response > > > > > > > >>>>> indicating > > > > > > > >>>>>>>> that > > > > > > > >>>>>>>>>>>>>>>>>>>>>> it's no longer the leader for > > > > > > > >>> the > > > > > > > >>>>>>>>> "test-0". > > > > > > > >>>>>>>>>>>>>>>>>>>>>> To see the latency difference > > > > > > > >> between > > > > > > > >>>> 1.1 > > > > > > > >>>>>> and > > > > > > > >>>>>>>>> 1.2, > > > > > > > >>>>>>>>>>>> let's > > > > > > > >>>>>>>>>>>>>> say > > > > > > > >>>>>>>>>>>>>>>>>> there > > > > > > > >>>>>>>>>>>>>>>>>>>> are > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the > > > > > > > >>>>>> LeaderAndISR, > > > > > > > >>>>>>>>> and > > > > > > > >>>>>>>>>>>> there > > > > > > > >>>>>>>>>>>>>> are > > > > > > > >>>>>>>>>>>>>>> 8 > > > > > > > >>>>>>>>>>>>>>>>> io > > > > > > > >>>>>>>>>>>>>>>>>>>>> threads, > > > > > > > >>>>>>>>>>>>>>>>>>>>>> so each io thread will process > > > > > > > >>>>>> approximately > > > > > > > >>>>>>>>> 3000 > > > > > > > >>>>>>>>>>>>> produce > > > > > > > >>>>>>>>>>>>>>>>>> requests. > > > > > > > >>>>>>>>>>>>>>>>>>>> Now > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that > > > > > > > >>>>> finally > > > > > > > >>>>>>>>>> processed > > > > > > > >>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> For the 3000 produce requests, if > > > > > > > >> we > > > > > > > >>>>> model > > > > > > > >>>>>>>> the > > > > > > > >>>>>>>>>> time > > > > > > > >>>>>>>>>>>> when > > > > > > > >>>>>>>>>>>>>>> their > > > > > > > >>>>>>>>>>>>>>>>>>>>> remaining > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1, > > > > > > > >>>> ...t2999, > > > > > > > >>>>>> and > > > > > > > >>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>> LeaderAndISR > > > > > > > >>>>>>>>>>>>>>>>>>>> request > > > > > > > >>>>>>>>>>>>>>>>>>>>> is > > > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Without this KIP, the 1st produce > > > > > > > >>>> request > > > > > > > >>>>>>>> would > > > > > > > >>>>>>>>>> have > > > > > > > >>>>>>>>>>>>>> waited > > > > > > > >>>>>>>>>>>>>>> an > > > > > > > >>>>>>>>>>>>>>>>>>> extra > > > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the > > > > > > > >>> 2nd > > > > > > > >>>>> an > > > > > > > >>>>>>>> extra > > > > > > > >>>>>>>>>>> time > > > > > > > >>>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>>>>>> t3000 - > > > > > > > >>>>>>>>>>>>>>>>>>> t1, > > > > > > > >>>>>>>>>>>>>>>>>>>>> etc. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Roughly speaking, the latency > > > > > > > >>>> difference > > > > > > > >>>>> is > > > > > > > >>>>>>>>> bigger > > > > > > > >>>>>>>>>>> for > > > > > > > >>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>> earlier > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later > > > > > > > >>> ones. > > > > > > > >>>>> For > > > > > > > >>>>>>>> the > > > > > > > >>>>>>>>>> same > > > > > > > >>>>>>>>>>>>>> reason, > > > > > > > >>>>>>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>> more > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued > > > > > > > >>>>>>>>>>>>>>>>>>>>>> before the LeaderAndISR, the bigger > > > > > > > >>>>> benefit > > > > > > > >>>>>>>> we > > > > > > > >>>>>>>>> get > > > > > > > >>>>>>>>>>>>> (capped > > > > > > > >>>>>>>>>>>>>>> by > > > > > > > >>>>>>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout). > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests > > > > > > > >>> have > > > > > > > >>>>>>>> acks=0 or > > > > > > > >>>>>>>>>>>> acks=1 > > > > > > > >>>>>>>>>>>>>>>>>>>>>> There will be no latency > > > > > > > >> differences > > > > > > > >>> in > > > > > > > >>>>>> this > > > > > > > >>>>>>>>> case, > > > > > > > >>>>>>>>>>> but > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2.1 without this KIP, the records > > > > > > > >> of > > > > > > > >>>>>>>> partition > > > > > > > >>>>>>>>>>> test-0 > > > > > > > >>>>>>>>>>>> in > > > > > > > >>>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the > > > > > > > >>> LeaderAndISR > > > > > > > >>>>>> will > > > > > > > >>>>>>>> be > > > > > > > >>>>>>>>>>>> appended > > > > > > > >>>>>>>>>>>>>> to > > > > > > > >>>>>>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>> local > > > > > > > >>>>>>>>>>>>>>>>>>>>> log, > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and eventually be truncated > > > > > > > >>> after > > > > > > > >>>>>>>>> processing > > > > > > > >>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR. > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as > > > > > > > >>>>>>>>>>>>>>>>>>>>>> "some unofficial definition > > > > > > > >> of > > > > > > > >>>> data > > > > > > > >>>>>>>> loss > > > > > > > >>>>>>>>> in > > > > > > > >>>>>>>>>>>> terms > > > > > > > >>>>>>>>>>>>> of > > > > > > > >>>>>>>>>>>>>>>>>> messages > > > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark". > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2.2 with this KIP, we can mitigate > > > > > > > >>> the > > > > > > > >>>>>> effect > > > > > > > >>>>>>>>>> since > > > > > > > >>>>>>>>>>> if > > > > > > > >>>>>>>>>>>>> the > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR > > > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the > > > > > > > >> response > > > > > > > >>> to > > > > > > > >>>>>>>>> producers > > > > > > > >>>>>>>>>>> will > > > > > > > >>>>>>>>>>>>>> have > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the NotLeaderForPartition > > > > > > > >>> error, > > > > > > > >>>>>>>> causing > > > > > > > >>>>>>>>>>>> producers > > > > > > > >>>>>>>>>>>>>> to > > > > > > > >>>>>>>>>>>>>>>>> retry > > > > > > > >>>>>>>>>>>>>>>>>>>>>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit > > > > > > > >>> for > > > > > > > >>>>>>>> reducing > -- -Regards, Mayuresh R. Gharat (862) 250-7125