Hello, Per Ismael's suggestion, I'd like to get comments from the original voters for KIP-219 (Becket, Jun, Rajini) and others about the new interface change proposed in the discussion thread ( https://markmail.org/search/?q=kafka+KIP-219#query:kafka%20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results). Would you let me know?
Thanks, Jon On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin <lindon...@gmail.com> wrote: > +Jon > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <becket....@gmail.com> wrote: > >> Thanks for the discussion and voting. >> >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini). >> >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <rajinisiva...@gmail.com> >> wrote: >> >> > Hi Becket, >> > >> > Thanks for the update. Yes, it does address my concern. >> > >> > +1 (binding) >> > >> > Regards, >> > >> > Rajini >> > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <becket....@gmail.com> >> wrote: >> > >> > > Actually returning an empty fetch request may still be useful to >> reduce >> > the >> > > throttle time due to request quota violation because the FetchResponse >> > send >> > > time will be less. I just updated the KIP. >> > > >> > > Rajini, does that address your concern? >> > > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <becket....@gmail.com> >> > wrote: >> > > >> > > > Thanks for the reply, Jun. >> > > > >> > > > Currently the byte rate quota does not apply to HeartbeatRequest, >> > > > JoinGroupRequest/SyncGroupRequest. So the only case those requests >> are >> > > > throttled is because the request quota is violated. In that case, >> the >> > > > throttle time does not really matter whether we return a full >> > > FetchResponse >> > > > or an empty one. Would it be more consistent if we throttle based on >> > the >> > > > actual throttle time / channel mute time? >> > > > >> > > > Thanks, >> > > > >> > > > Jiangjie (Becket) Qin >> > > > >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <j...@confluent.io> wrote: >> > > > >> > > >> Hi, Jiangjie, >> > > >> >> > > >> You are right that the heartbeat is done in a channel different >> from >> > the >> > > >> fetch request. I think it's still useful to return an empty fetch >> > > response >> > > >> when the quota is violated. This way, the throttle time for the >> > > heartbeat >> > > >> request won't be large. I agree that we can just mute the channel >> for >> > > the >> > > >> fetch request for the throttle time computed based on a full fetch >> > > >> response. This probably also partially addresses Rajini's #1 >> concern. >> > > >> >> > > >> Thanks, >> > > >> >> > > >> Jun >> > > >> >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <becket....@gmail.com> >> > > wrote: >> > > >> >> > > >> > Hi Rajini, >> > > >> > >> > > >> > Thanks for the comments. Pleas see the reply inline. >> > > >> > >> > > >> > Hi Jun, >> > > >> > >> > > >> > Thinking about the consumer rebalance case a bit more, I am not >> sure >> > > if >> > > >> we >> > > >> > need to worry about the delayed rebalance due to quota violation >> or >> > > not. >> > > >> > The rebalance actually uses a separate channel to coordinator. >> > > Therefore >> > > >> > unless the request quota is hit, the rebalance won't be >> throttled. >> > > Even >> > > >> if >> > > >> > request quota is hit, it seems unlikely to be delayed long. So it >> > > looks >> > > >> > that we don't need to unmute the channel earlier than needed. >> Does >> > > this >> > > >> > address your concern? >> > > >> > >> > > >> > Thanks, >> > > >> > >> > > >> > Jiangjie (Becket) Qin >> > > >> > >> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram < >> > > >> rajinisiva...@gmail.com> >> > > >> > wrote: >> > > >> > >> > > >> > > Hi Becket, >> > > >> > > >> > > >> > > A few questions: >> > > >> > > >> > > >> > > 1. KIP says: *Although older client implementations (prior to >> > > >> knowledge >> > > >> > of >> > > >> > > this KIP) will immediately send the next request after the >> broker >> > > >> > responds >> > > >> > > without paying attention to the throttle time field, the >> broker is >> > > >> > > protected by virtue of muting the channel for time X. i.e., the >> > next >> > > >> > > request will not be processed until the channel is unmuted. * >> > > >> > > For fetch requests, the response is sent immediately and the >> mute >> > > >> time on >> > > >> > > the broker based on empty fetch response will often be zero >> > (unlike >> > > >> the >> > > >> > > throttle time returned to the client based on non-empty >> response). >> > > >> Won't >> > > >> > > that lead to a tight loop of fetch requests from old clients >> > > >> > (particularly >> > > >> > > expensive with SSL)? Wouldn't it be better to retain current >> > > behaviour >> > > >> > for >> > > >> > > old clients? Also, if we change the behaviour for old clients, >> > > client >> > > >> > > metrics that track throttle time will report a lot of throttle >> > > >> unrelated >> > > >> > to >> > > >> > > actual throttle time. >> > > >> > > >> > > >> > For consumers, if quota is violated, the throttle time on the >> broker >> > > >> will >> > > >> > not be 0. It is just that the throttle time will not be >> increasing >> > > >> because >> > > >> > the consumer will return an empty response in this case. So there >> > > should >> > > >> > not be a tight loop. >> > > >> > >> > > >> > >> > > >> > > 2. KIP says: *The usual idle timeout i.e., >> > connections.max.idle.ms >> > > >> > > <http://connections.max.idle.ms> will still be honored during >> the >> > > >> > throttle >> > > >> > > time X. This makes sure that the brokers will detect client >> > > connection >> > > >> > > closure in a bounded time.* >> > > >> > > Wouldn't it be better to bound maximum throttle time to >> > > >> > > *connections.max.idle.ms >> > > >> > > <http://connections.max.idle.ms>*? If we mute for a time >> greater >> > > than >> > > >> > > *connections.max.idle.ms >> > > >> > > <http://connections.max.idle.ms>* and then close a client >> > > connection >> > > >> > > simply >> > > >> > > because we had muted it on the broker for a longer throttle >> time, >> > we >> > > >> > force >> > > >> > > a reconnection and read the next request from the new >> connection >> > > >> straight >> > > >> > > away. This feels the same as a bound on the quota of * >> > > >> > > connections.max.idle.ms >> > > >> > > <http://connections.max.idle.ms>*, but with added load on the >> > > broker >> > > >> for >> > > >> > > authenticating another connection (expensive with SSL). >> > > >> > > >> > > >> > I think we need to think about the consumer prior to and after >> this >> > > KIP >> > > >> > separately. >> > > >> > >> > > >> > For consumer prior to this KIP, even if we unmute the channel >> after >> > > >> > connection.max.idle.ms, it is likely that the consumers have >> > already >> > > >> > reached request.timeout.ms before that and has reconnected to >> the >> > > >> broker. >> > > >> > So there is no real difference whether we close the throttled >> > channel >> > > or >> > > >> > not. >> > > >> > >> > > >> > For consumers after the KIP, because they will honor the throttle >> > > time, >> > > >> > they will back off until throttle time is reached. If that >> throttle >> > > >> time is >> > > >> > longer than connection.max.idle.ms, it seems not a big overhead >> > > because >> > > >> > there will only be one connection re-establishment in quite a few >> > > >> minutes. >> > > >> > Compared with such overhead, it seems more important to honor the >> > > quota >> > > >> so >> > > >> > the broker can survive. >> > > >> > >> > > >> > >> > > >> > > 3. Are we changing the behaviour of network bandwidth quota for >> > > >> > > Produce/Fetch and retaining current behaviour for request >> quotas? >> > > >> > > >> > > >> > This is going to be applied to all the quota. Including request >> > > quotas. >> > > >> > >> > > >> > >> > > >> > > >> > > >> > > Thanks, >> > > >> > > >> > > >> > > Rajini >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <j...@confluent.io> >> > wrote: >> > > >> > > >> > > >> > > > Hi, Jiangjie, >> > > >> > > > >> > > >> > > > Thanks for the updated KIP. +1 >> > > >> > > > >> > > >> > > > Jun >> > > >> > > > >> > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin < >> > becket....@gmail.com> >> > > >> > wrote: >> > > >> > > > >> > > >> > > > > Thanks for the comments, Jun. >> > > >> > > > > >> > > >> > > > > 1. Good point. >> > > >> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is >> > high >> > > >> > enough >> > > >> > > > so >> > > >> > > > > the throttling is impacted. >> > > >> > > > > >> > > >> > > > > I have updated the KIP to reflect the changes. >> > > >> > > > > >> > > >> > > > > Thanks, >> > > >> > > > > >> > > >> > > > > Jiangjie (Becket) Qin >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <j...@confluent.io> >> > > wrote: >> > > >> > > > > >> > > >> > > > > > Hi, Jiangjie, >> > > >> > > > > > >> > > >> > > > > > Sorry for the late response. The proposal sounds good >> > > overall. A >> > > >> > > couple >> > > >> > > > > of >> > > >> > > > > > minor comments below. >> > > >> > > > > > >> > > >> > > > > > 1. For throttling a fetch request, we could potentially >> just >> > > >> send >> > > >> > an >> > > >> > > > > empty >> > > >> > > > > > response. We can return a throttle time calculated from a >> > full >> > > >> > > > response, >> > > >> > > > > > but only mute the channel on the server based on a >> throttle >> > > time >> > > >> > > > > calculated >> > > >> > > > > > based on the empty response. This has the benefit that >> the >> > > >> server >> > > >> > > will >> > > >> > > > > mute >> > > >> > > > > > the channel much shorter, which will prevent the consumer >> > from >> > > >> > > > > rebalancing >> > > >> > > > > > when throttled. >> > > >> > > > > > >> > > >> > > > > > 2. The wiki says "connections.max.idle.ms should be >> ignored >> > > >> during >> > > >> > > the >> > > >> > > > > > throttle time X." This has the potential issue that a >> server >> > > may >> > > >> > not >> > > >> > > > > detect >> > > >> > > > > > that a client connection is already gone until after an >> > > >> arbitrary >> > > >> > > > amount >> > > >> > > > > of >> > > >> > > > > > time. Perhaps we could still just close a connection if >> the >> > > >> server >> > > >> > > has >> > > >> > > > > > muted it for longer than connections.max.idle.ms. This >> will >> > > at >> > > >> > least >> > > >> > > > > bound >> > > >> > > > > > the time for a server to detect closed client >> connections. >> > > >> > > > > > >> > > >> > > > > > Thanks, >> > > >> > > > > > >> > > >> > > > > > Jun >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin < >> > > >> becket....@gmail.com> >> > > >> > > > > wrote: >> > > >> > > > > > >> > > >> > > > > > > Hi, >> > > >> > > > > > > >> > > >> > > > > > > We would like to start the voting thread for KIP-219. >> The >> > > KIP >> > > >> > > > proposes >> > > >> > > > > to >> > > >> > > > > > > improve the quota communication between the brokers and >> > > >> clients, >> > > >> > > > > > especially >> > > >> > > > > > > for cases of long throttling time. >> > > >> > > > > > > >> > > >> > > > > > > The KIP wiki is following: >> > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > >> > > > > > 219+-+Improve+quota+ >> > > >> > > > > > > communication >> > > >> > > > > > > >> > > >> > > > > > > The discussion thread is here: >> > > >> > > > > > > http://markmail.org/search/?q= >> kafka+KIP-219#query:kafka% >> > > >> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results >> > > >> > > > > > > >> > > >> > > > > > > Thanks, >> > > >> > > > > > > >> > > >> > > > > > > Jiangjie (Becket) Qin >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > > >> > > >> > >> >