Re: [VOTE] KIP-219 - Improve Quota Communication

Jonghyun Lee Mon, 07 May 2018 20:01:25 -0700

Thanks for the detailed explanation, Rajini. I think it makes sense not to
make the protocol change for them. I'll revert those changes.


Jon

On Mon, May 7, 2018 at 6:42 AM, Rajini Sivaram <rajinisiva...@gmail.com>
wrote:

> Hi Jon,
>
> Thanks for the explanation. We had deliberately avoided adding
> throttle_time_ms to LeaderAndIsrResponse,UpdateMetadataResponse,
> SaslAuthenticateResponse and SaslHandshakeResponse since clients
> (producer/consumer/admin client) would never get throttled on these.
> Clients don't use LeaderAndIsr and UpdateMetadata requests, clients never
> send SaslAuthenticate and SaslHandshake requests after the initial
> authentication. We do throttle these, but that is to prevent DoS attacks or
> requests from clients that were not implemented correctly. In these cases,
> we would rely on muting on the broker-side to enforce quotas rather than
> have clients mute themselves. So perhaps we don't need the protocol change?
>
>
> On Sat, May 5, 2018 at 12:49 AM, Jonghyun Lee <jonghy...@gmail.com> wrote:
>
> > Per Rajini's request, I'd like to add one more discussion item.
> >
> > I added the THROTTLE_TIME_MS field to LeaderAndIsrResponse,
> > UpdateMetadataResponse, SaslAuthenticateResponse and
> SaslHandshakeResponse.
> > These are cluster actions and normally not throttled, but they can
> actually
> > be throttled on errors (cluster authorization failure for the first two
> and
> > requests coming in after authorization is done for the last two) for
> > request quota violation. And when they are throttled, I want throttle
> time
> > to be passed back to the client so that the client can refrain from
> sending
> > more requests to the broker until the throttle time is over.
> > For non-error cases, they won't be throttled as before and there are no
> > behavior changes (except that the response will have a zero throttle
> time).
> > If we do not include THROTTLE_TIME_MS in those responses like before,
> they
> > will still be throttled on the broker side, but the client won't know
> that
> > and can keep sending requests to the broker while throttling is in
> > progress. I assume this is not a big concern for these API types, though.
> >
> > Please let me know what you think, and if we decide to keep this change,
> > I'll update the KIP.
> >
> > Thanks,
> > Jon
> >
> >
> > On Wed, May 2, 2018 at 12:08 PM, Jonghyun Lee <jonghy...@gmail.com>
> wrote:
> >
> > > Hi Magnus and Jun, do you have any feedback on this?
> > >
> > > Since two of the original voters (Becket and Rajini) showed a
> preference
> > > for bumping up all request versions, I'll wait till tomorrow and start
> > > implementing it unless someone expresses different opinions.
> > >
> > > As for the implementation, the current plan is to add a method to
> > > AbstractResponse which tells whether or not client should throttle and
> > have
> > > an implementation in each response type based on its version.
> > >
> > > Thanks,
> > > Jon
> > >
> > >
> > > On Mon, Apr 30, 2018 at 9:15 AM, Jonghyun Lee <jonghy...@gmail.com>
> > wrote:
> > >
> > >> Hi Rajini,
> > >>
> > >> Thanks for letting me know the SASL use case. I think this is a
> similar
> > >> point that Becket raised (creating dependency between different
> > requests)
> > >> and it makes sense to me.
> > >>
> > >> Will wait for some more feedback and finalize.
> > >>
> > >> Thanks,
> > >> Jon
> > >>
> > >>
> > >>
> > >> On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Jon,
> > >>>
> > >>> We currently send two ApiVersions requests for SASL, the first with
> > >>> version
> > >>> 0 to get the request versions for SASL and another after
> > authentication.
> > >>> You are currently using the second and basing all throttling
> decisions
> > on
> > >>> that. This may get in the way if we wanted to change this code in the
> > >>> future (e.g. throttle SASL requests or combine the two ApiVersions
> > >>> requests
> > >>> into one). So if it is easy enough to check request versions against
> an
> > >>> immutable map rather than track broker ApiVersions in a mutable set,
> > >>> perhaps that would be better?
> > >>>
> > >>> Hi Magnus, It will be good to know what works better for non-Java
> > >>> clients.
> > >>>
> > >>>
> > >>> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee <jonghy...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > Thanks, Becket.
> > >>> >
> > >>> > Assuming that requiring new client implementations to issue
> > >>> > ApiVersionsRequest before issuing any other requests is reasonable
> as
> > >>> you
> > >>> > mentioned, I am wondering if keeping track of brokers that support
> > >>> KIP-219
> > >>> > is actually simpler than keeping a map from each request type to
> its
> > >>> min
> > >>> > version number supporting KIP-219 and checking the version of every
> > >>> single
> > >>> > request to see whether or not to throttle from the client side. To
> > me,
> > >>> it
> > >>> > looks like a broker property that can be cached, and checking every
> > >>> single
> > >>> > request seems excessive.
> > >>> >
> > >>> > Thanks,
> > >>> > Jon
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin <becket....@gmail.com
> >
> > >>> wrote:
> > >>> >
> > >>> > > The reason we wanted to bump up all the request versions was let
> > the
> > >>> > > clients know whether the broker has already throttled the request
> > or
> > >>> not.
> > >>> > > This avoids double throttling in both brokers and clients, if the
> > >>> clients
> > >>> > > implementation supports KIP-219.
> > >>> > >
> > >>> > > The new change uses only ApiVersionRequest, instead of all the
> > >>> requests,
> > >>> > to
> > >>> > > indicate whether the brokers have applied throttling or not. The
> > >>> > difference
> > >>> > > is that all the clients must issue ApiVersionRequest first before
> > >>> issuing
> > >>> > > any other requests. This has no impact on the existing clients.
> But
> > >>> for
> > >>> > > clients implementation that wants to support KIP-219, they need
> to
> > >>> follow
> > >>> > > this practice, which seems to be reasonable.
> > >>> > > Initially I thought the change is fine. However, thinking about
> > this
> > >>> > again,
> > >>> > > that means the clients implementation needs to remember the
> > >>> ApiVersions
> > >>> > of
> > >>> > > each broker, which may have some complexity. Also this seems
> > >>> introduces
> > >>> > the
> > >>> > > behavior dependency between different requests, which seems
> > >>> unnecessary.
> > >>> > >
> > >>> > > Due to the above reasons, I think it might be better to bump up
> all
> > >>> the
> > >>> > > request versions.
> > >>> > >
> > >>> > > Thanks,
> > >>> > >
> > >>> > > Jiangjie (Becket) Qin
> > >>> > >
> > >>> > >
> > >>> > > On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee <
> jonghy...@gmail.com
> > >
> > >>> > wrote:
> > >>> > >
> > >>> > > > Hello,
> > >>> > > >
> > >>> > > > Per Ismael's suggestion, I'd like to get comments from the
> > original
> > >>> > > voters
> > >>> > > > for KIP-219 (Becket, Jun, Rajini) and others about the new
> > >>> interface
> > >>> > > change
> > >>> > > > proposed in the discussion thread (
> > >>> > > > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > >>> > > > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
> > >>> > > > Would you let me know?
> > >>> > > >
> > >>> > > > Thanks,
> > >>> > > > Jon
> > >>> > > >
> > >>> > > >
> > >>> > > > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin <lindon...@gmail.com
> >
> > >>> wrote:
> > >>> > > >
> > >>> > > > > +Jon
> > >>> > > > >
> > >>> > > > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <
> > becket....@gmail.com
> > >>> >
> > >>> > > wrote:
> > >>> > > > >
> > >>> > > > >> Thanks for the discussion and voting.
> > >>> > > > >>
> > >>> > > > >> KIP-219 has passed with +3 binding votes (Becket, Jun,
> > Rajini).
> > >>> > > > >>
> > >>> > > > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
> > >>> > > > rajinisiva...@gmail.com>
> > >>> > > > >> wrote:
> > >>> > > > >>
> > >>> > > > >> > Hi Becket,
> > >>> > > > >> >
> > >>> > > > >> > Thanks for the update. Yes, it does address my concern.
> > >>> > > > >> >
> > >>> > > > >> > +1 (binding)
> > >>> > > > >> >
> > >>> > > > >> > Regards,
> > >>> > > > >> >
> > >>> > > > >> > Rajini
> > >>> > > > >> >
> > >>> > > > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <
> > >>> becket....@gmail.com
> > >>> > >
> > >>> > > > >> wrote:
> > >>> > > > >> >
> > >>> > > > >> > > Actually returning an empty fetch request may still be
> > >>> useful to
> > >>> > > > >> reduce
> > >>> > > > >> > the
> > >>> > > > >> > > throttle time due to request quota violation because the
> > >>> > > > FetchResponse
> > >>> > > > >> > send
> > >>> > > > >> > > time will be less. I just updated the KIP.
> > >>> > > > >> > >
> > >>> > > > >> > > Rajini, does that address your concern?
> > >>> > > > >> > >
> > >>> > > > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <
> > >>> > becket....@gmail.com
> > >>> > > >
> > >>> > > > >> > wrote:
> > >>> > > > >> > >
> > >>> > > > >> > > > Thanks for the reply, Jun.
> > >>> > > > >> > > >
> > >>> > > > >> > > > Currently the byte rate quota does not apply to
> > >>> > > HeartbeatRequest,
> > >>> > > > >> > > > JoinGroupRequest/SyncGroupRequest. So the only case
> > those
> > >>> > > > requests
> > >>> > > > >> are
> > >>> > > > >> > > > throttled is because the request quota is violated. In
> > >>> that
> > >>> > > case,
> > >>> > > > >> the
> > >>> > > > >> > > > throttle time does not really matter whether we
> return a
> > >>> full
> > >>> > > > >> > > FetchResponse
> > >>> > > > >> > > > or an empty one. Would it be more consistent if we
> > >>> throttle
> > >>> > > based
> > >>> > > > on
> > >>> > > > >> > the
> > >>> > > > >> > > > actual throttle time / channel mute time?
> > >>> > > > >> > > >
> > >>> > > > >> > > > Thanks,
> > >>> > > > >> > > >
> > >>> > > > >> > > > Jiangjie (Becket) Qin
> > >>> > > > >> > > >
> > >>> > > > >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <
> > >>> j...@confluent.io>
> > >>> > > > wrote:
> > >>> > > > >> > > >
> > >>> > > > >> > > >> Hi, Jiangjie,
> > >>> > > > >> > > >>
> > >>> > > > >> > > >> You are right that the heartbeat is done in a channel
> > >>> > different
> > >>> > > > >> from
> > >>> > > > >> > the
> > >>> > > > >> > > >> fetch request. I think it's still useful to return an
> > >>> empty
> > >>> > > fetch
> > >>> > > > >> > > response
> > >>> > > > >> > > >> when the quota is violated. This way, the throttle
> time
> > >>> for
> > >>> > the
> > >>> > > > >> > > heartbeat
> > >>> > > > >> > > >> request won't be large. I agree that we can just mute
> > the
> > >>> > > channel
> > >>> > > > >> for
> > >>> > > > >> > > the
> > >>> > > > >> > > >> fetch request for the throttle time computed based
> on a
> > >>> full
> > >>> > > > fetch
> > >>> > > > >> > > >> response. This probably also partially addresses
> > >>> Rajini's #1
> > >>> > > > >> concern.
> > >>> > > > >> > > >>
> > >>> > > > >> > > >> Thanks,
> > >>> > > > >> > > >>
> > >>> > > > >> > > >> Jun
> > >>> > > > >> > > >>
> > >>> > > > >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <
> > >>> > > > becket....@gmail.com>
> > >>> > > > >> > > wrote:
> > >>> > > > >> > > >>
> > >>> > > > >> > > >> > Hi Rajini,
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > Thanks for the comments. Pleas see the reply
> inline.
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > Hi Jun,
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > Thinking about the consumer rebalance case a bit
> > more,
> > >>> I am
> > >>> > > not
> > >>> > > > >> sure
> > >>> > > > >> > > if
> > >>> > > > >> > > >> we
> > >>> > > > >> > > >> > need to worry about the delayed rebalance due to
> > quota
> > >>> > > > violation
> > >>> > > > >> or
> > >>> > > > >> > > not.
> > >>> > > > >> > > >> > The rebalance actually uses a separate channel to
> > >>> > > coordinator.
> > >>> > > > >> > > Therefore
> > >>> > > > >> > > >> > unless the request quota is hit, the rebalance
> won't
> > be
> > >>> > > > >> throttled.
> > >>> > > > >> > > Even
> > >>> > > > >> > > >> if
> > >>> > > > >> > > >> > request quota is hit, it seems unlikely to be
> delayed
> > >>> long.
> > >>> > > So
> > >>> > > > it
> > >>> > > > >> > > looks
> > >>> > > > >> > > >> > that we don't need to unmute the channel earlier
> than
> > >>> > needed.
> > >>> > > > >> Does
> > >>> > > > >> > > this
> > >>> > > > >> > > >> > address your concern?
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > Thanks,
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > Jiangjie (Becket) Qin
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> > >>> > > > >> > > >> rajinisiva...@gmail.com>
> > >>> > > > >> > > >> > wrote:
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > > Hi Becket,
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > > A few questions:
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > > 1. KIP says: *Although older client
> implementations
> > >>> > (prior
> > >>> > > to
> > >>> > > > >> > > >> knowledge
> > >>> > > > >> > > >> > of
> > >>> > > > >> > > >> > > this KIP) will immediately send the next request
> > >>> after
> > >>> > the
> > >>> > > > >> broker
> > >>> > > > >> > > >> > responds
> > >>> > > > >> > > >> > > without paying attention to the throttle time
> > field,
> > >>> the
> > >>> > > > >> broker is
> > >>> > > > >> > > >> > > protected by virtue of muting the channel for
> time
> > X.
> > >>> > i.e.,
> > >>> > > > the
> > >>> > > > >> > next
> > >>> > > > >> > > >> > > request will not be processed until the channel
> is
> > >>> > > unmuted. *
> > >>> > > > >> > > >> > > For fetch requests, the response is sent
> > immediately
> > >>> and
> > >>> > > the
> > >>> > > > >> mute
> > >>> > > > >> > > >> time on
> > >>> > > > >> > > >> > > the broker based on empty fetch response will
> often
> > >>> be
> > >>> > zero
> > >>> > > > >> > (unlike
> > >>> > > > >> > > >> the
> > >>> > > > >> > > >> > > throttle time returned to the client based on
> > >>> non-empty
> > >>> > > > >> response).
> > >>> > > > >> > > >> Won't
> > >>> > > > >> > > >> > > that lead to a tight loop of fetch requests from
> > old
> > >>> > > clients
> > >>> > > > >> > > >> > (particularly
> > >>> > > > >> > > >> > > expensive with SSL)? Wouldn't it be better to
> > retain
> > >>> > > current
> > >>> > > > >> > > behaviour
> > >>> > > > >> > > >> > for
> > >>> > > > >> > > >> > > old clients? Also, if we change the behaviour for
> > old
> > >>> > > > clients,
> > >>> > > > >> > > client
> > >>> > > > >> > > >> > > metrics that track throttle time will report a
> lot
> > of
> > >>> > > > throttle
> > >>> > > > >> > > >> unrelated
> > >>> > > > >> > > >> > to
> > >>> > > > >> > > >> > >  actual throttle time.
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > For consumers, if quota is violated, the throttle
> > time
> > >>> on
> > >>> > the
> > >>> > > > >> broker
> > >>> > > > >> > > >> will
> > >>> > > > >> > > >> > not be 0. It is just that the throttle time will
> not
> > be
> > >>> > > > >> increasing
> > >>> > > > >> > > >> because
> > >>> > > > >> > > >> > the consumer will return an empty response in this
> > >>> case. So
> > >>> > > > there
> > >>> > > > >> > > should
> > >>> > > > >> > > >> > not be a tight loop.
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > > 2. KIP says: *The usual idle timeout i.e.,
> > >>> > > > >> > connections.max.idle.ms
> > >>> > > > >> > > >> > > <http://connections.max.idle.ms> will still be
> > >>> honored
> > >>> > > > during
> > >>> > > > >> the
> > >>> > > > >> > > >> > throttle
> > >>> > > > >> > > >> > > time X. This makes sure that the brokers will
> > detect
> > >>> > client
> > >>> > > > >> > > connection
> > >>> > > > >> > > >> > > closure in a bounded time.*
> > >>> > > > >> > > >> > > Wouldn't it be better to bound maximum throttle
> > time
> > >>> to
> > >>> > > > >> > > >> > > *connections.max.idle.ms
> > >>> > > > >> > > >> > > <http://connections.max.idle.ms>*? If we mute
> for
> > a
> > >>> time
> > >>> > > > >> greater
> > >>> > > > >> > > than
> > >>> > > > >> > > >> > > *connections.max.idle.ms
> > >>> > > > >> > > >> > > <http://connections.max.idle.ms>* and then
> close a
> > >>> > client
> > >>> > > > >> > > connection
> > >>> > > > >> > > >> > > simply
> > >>> > > > >> > > >> > > because we had muted it on the broker for a
> longer
> > >>> > throttle
> > >>> > > > >> time,
> > >>> > > > >> > we
> > >>> > > > >> > > >> > force
> > >>> > > > >> > > >> > > a reconnection and read the next request from the
> > new
> > >>> > > > >> connection
> > >>> > > > >> > > >> straight
> > >>> > > > >> > > >> > > away. This feels the same as a bound on the quota
> > of
> > >>> *
> > >>> > > > >> > > >> > > connections.max.idle.ms
> > >>> > > > >> > > >> > > <http://connections.max.idle.ms>*, but with
> added
> > >>> load
> > >>> > on
> > >>> > > > the
> > >>> > > > >> > > broker
> > >>> > > > >> > > >> for
> > >>> > > > >> > > >> > > authenticating another connection (expensive with
> > >>> SSL).
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > I think we need to think about the consumer prior
> to
> > >>> and
> > >>> > > after
> > >>> > > > >> this
> > >>> > > > >> > > KIP
> > >>> > > > >> > > >> > separately.
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > For consumer prior to this KIP, even if we unmute
> the
> > >>> > channel
> > >>> > > > >> after
> > >>> > > > >> > > >> > connection.max.idle.ms, it is likely that the
> > >>> consumers
> > >>> > have
> > >>> > > > >> > already
> > >>> > > > >> > > >> > reached request.timeout.ms before that and has
> > >>> reconnected
> > >>> > > to
> > >>> > > > >> the
> > >>> > > > >> > > >> broker.
> > >>> > > > >> > > >> > So there is no real difference whether we close the
> > >>> > throttled
> > >>> > > > >> > channel
> > >>> > > > >> > > or
> > >>> > > > >> > > >> > not.
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > For consumers after the KIP, because they will
> honor
> > >>> the
> > >>> > > > throttle
> > >>> > > > >> > > time,
> > >>> > > > >> > > >> > they will back off until throttle time is reached.
> If
> > >>> that
> > >>> > > > >> throttle
> > >>> > > > >> > > >> time is
> > >>> > > > >> > > >> > longer than connection.max.idle.ms, it seems not a
> > big
> > >>> > > > overhead
> > >>> > > > >> > > because
> > >>> > > > >> > > >> > there will only be one connection re-establishment
> in
> > >>> > quite a
> > >>> > > > few
> > >>> > > > >> > > >> minutes.
> > >>> > > > >> > > >> > Compared with such overhead, it seems more
> important
> > to
> > >>> > honor
> > >>> > > > the
> > >>> > > > >> > > quota
> > >>> > > > >> > > >> so
> > >>> > > > >> > > >> > the broker can survive.
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > > 3. Are we changing the behaviour of network
> > bandwidth
> > >>> > quota
> > >>> > > > for
> > >>> > > > >> > > >> > > Produce/Fetch and retaining current behaviour for
> > >>> request
> > >>> > > > >> quotas?
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > This is going to be applied to all the quota.
> > Including
> > >>> > > request
> > >>> > > > >> > > quotas.
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > > Thanks,
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > > Rajini
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <
> > >>> > > j...@confluent.io>
> > >>> > > > >> > wrote:
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> > > > Hi, Jiangjie,
> > >>> > > > >> > > >> > > >
> > >>> > > > >> > > >> > > > Thanks for the updated KIP. +1
> > >>> > > > >> > > >> > > >
> > >>> > > > >> > > >> > > > Jun
> > >>> > > > >> > > >> > > >
> > >>> > > > >> > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <
> > >>> > > > >> > becket....@gmail.com>
> > >>> > > > >> > > >> > wrote:
> > >>> > > > >> > > >> > > >
> > >>> > > > >> > > >> > > > > Thanks for the comments, Jun.
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > > > 1. Good point.
> > >>> > > > >> > > >> > > > > 2. Also makes sense. Usually the
> > >>> > > connection.max.idle.ms
> > >>> > > > is
> > >>> > > > >> > high
> > >>> > > > >> > > >> > enough
> > >>> > > > >> > > >> > > > so
> > >>> > > > >> > > >> > > > > the throttling is impacted.
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > > > I have updated the KIP to reflect the
> changes.
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > > > Thanks,
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > > > Jiangjie (Becket) Qin
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <
> > >>> > > > j...@confluent.io>
> > >>> > > > >> > > wrote:
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > > > > Hi, Jiangjie,
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > > Sorry for the late response. The proposal
> > >>> sounds
> > >>> > good
> > >>> > > > >> > > overall. A
> > >>> > > > >> > > >> > > couple
> > >>> > > > >> > > >> > > > > of
> > >>> > > > >> > > >> > > > > > minor comments below.
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > > 1. For throttling a fetch request, we could
> > >>> > > potentially
> > >>> > > > >> just
> > >>> > > > >> > > >> send
> > >>> > > > >> > > >> > an
> > >>> > > > >> > > >> > > > > empty
> > >>> > > > >> > > >> > > > > > response. We can return a throttle time
> > >>> calculated
> > >>> > > > from a
> > >>> > > > >> > full
> > >>> > > > >> > > >> > > > response,
> > >>> > > > >> > > >> > > > > > but only mute the channel on the server
> based
> > >>> on a
> > >>> > > > >> throttle
> > >>> > > > >> > > time
> > >>> > > > >> > > >> > > > > calculated
> > >>> > > > >> > > >> > > > > > based on the empty response. This has the
> > >>> benefit
> > >>> > > that
> > >>> > > > >> the
> > >>> > > > >> > > >> server
> > >>> > > > >> > > >> > > will
> > >>> > > > >> > > >> > > > > mute
> > >>> > > > >> > > >> > > > > > the channel much shorter, which will
> prevent
> > >>> the
> > >>> > > > consumer
> > >>> > > > >> > from
> > >>> > > > >> > > >> > > > > rebalancing
> > >>> > > > >> > > >> > > > > > when throttled.
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > > 2. The wiki says "connections.max.idle.ms
> > >>> should
> > >>> > be
> > >>> > > > >> ignored
> > >>> > > > >> > > >> during
> > >>> > > > >> > > >> > > the
> > >>> > > > >> > > >> > > > > > throttle time X." This has the potential
> > issue
> > >>> > that a
> > >>> > > > >> server
> > >>> > > > >> > > may
> > >>> > > > >> > > >> > not
> > >>> > > > >> > > >> > > > > detect
> > >>> > > > >> > > >> > > > > > that a client connection is already gone
> > until
> > >>> > after
> > >>> > > an
> > >>> > > > >> > > >> arbitrary
> > >>> > > > >> > > >> > > > amount
> > >>> > > > >> > > >> > > > > of
> > >>> > > > >> > > >> > > > > > time. Perhaps we could still just close a
> > >>> > connection
> > >>> > > if
> > >>> > > > >> the
> > >>> > > > >> > > >> server
> > >>> > > > >> > > >> > > has
> > >>> > > > >> > > >> > > > > > muted it for longer than
> > >>> connections.max.idle.ms.
> > >>> > > This
> > >>> > > > >> will
> > >>> > > > >> > > at
> > >>> > > > >> > > >> > least
> > >>> > > > >> > > >> > > > > bound
> > >>> > > > >> > > >> > > > > > the time for a server to detect closed
> client
> > >>> > > > >> connections.
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > > Thanks,
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > > Jun
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket
> Qin <
> > >>> > > > >> > > >> becket....@gmail.com>
> > >>> > > > >> > > >> > > > > wrote:
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > > > > Hi,
> > >>> > > > >> > > >> > > > > > >
> > >>> > > > >> > > >> > > > > > > We would like to start the voting thread
> > for
> > >>> > > KIP-219.
> > >>> > > > >> The
> > >>> > > > >> > > KIP
> > >>> > > > >> > > >> > > > proposes
> > >>> > > > >> > > >> > > > > to
> > >>> > > > >> > > >> > > > > > > improve the quota communication between
> the
> > >>> > brokers
> > >>> > > > and
> > >>> > > > >> > > >> clients,
> > >>> > > > >> > > >> > > > > > especially
> > >>> > > > >> > > >> > > > > > > for cases of long throttling time.
> > >>> > > > >> > > >> > > > > > >
> > >>> > > > >> > > >> > > > > > > The KIP wiki is following:
> > >>> > > > >> > > >> > > > > > > https://cwiki.apache.org/
> > >>> > > > confluence/display/KAFKA/KIP-
> > >>> > > > >> > > >> > > > > > 219+-+Improve+quota+
> > >>> > > > >> > > >> > > > > > > communication
> > >>> > > > >> > > >> > > > > > >
> > >>> > > > >> > > >> > > > > > > The discussion thread is here:
> > >>> > > > >> > > >> > > > > > > http://markmail.org/search/?q=
> > >>> > > > >> kafka+KIP-219#query:kafka%
> > >>> > > > >> > > >> > > > > > > 20KIP-219+page:1+mid:
> > >>> > > ooxabguy7nz7l7zy+state:results
> > >>> > > > >> > > >> > > > > > >
> > >>> > > > >> > > >> > > > > > > Thanks,
> > >>> > > > >> > > >> > > > > > >
> > >>> > > > >> > > >> > > > > > > Jiangjie (Becket) Qin
> > >>> > > > >> > > >> > > > > > >
> > >>> > > > >> > > >> > > > > >
> > >>> > > > >> > > >> > > > >
> > >>> > > > >> > > >> > > >
> > >>> > > > >> > > >> > >
> > >>> > > > >> > > >> >
> > >>> > > > >> > > >>
> > >>> > > > >> > > >
> > >>> > > > >> > > >
> > >>> > > > >> > >
> > >>> > > > >> >
> > >>> > > > >>
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Reply via email to