Re: [VOTE] KIP-219 - Improve Quota Communication

Rajini Sivaram Mon, 30 Apr 2018 03:47:00 -0700

Hi Jon,

We currently send two ApiVersions requests for SASL, the first with version
0 to get the request versions for SASL and another after authentication.
You are currently using the second and basing all throttling decisions on
that. This may get in the way if we wanted to change this code in the
future (e.g. throttle SASL requests or combine the two ApiVersions requests
into one). So if it is easy enough to check request versions against an
immutable map rather than track broker ApiVersions in a mutable set,
perhaps that would be better?


Hi Magnus, It will be good to know what works better for non-Java clients.


On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee <[email protected]> wrote:

> Thanks, Becket.
>
> Assuming that requiring new client implementations to issue
> ApiVersionsRequest before issuing any other requests is reasonable as you
> mentioned, I am wondering if keeping track of brokers that support KIP-219
> is actually simpler than keeping a map from each request type to its min
> version number supporting KIP-219 and checking the version of every single
> request to see whether or not to throttle from the client side. To me, it
> looks like a broker property that can be cached, and checking every single
> request seems excessive.
>
> Thanks,
> Jon
>
>
>
>
> On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin <[email protected]> wrote:
>
> > The reason we wanted to bump up all the request versions was let the
> > clients know whether the broker has already throttled the request or not.
> > This avoids double throttling in both brokers and clients, if the clients
> > implementation supports KIP-219.
> >
> > The new change uses only ApiVersionRequest, instead of all the requests,
> to
> > indicate whether the brokers have applied throttling or not. The
> difference
> > is that all the clients must issue ApiVersionRequest first before issuing
> > any other requests. This has no impact on the existing clients. But for
> > clients implementation that wants to support KIP-219, they need to follow
> > this practice, which seems to be reasonable.
> > Initially I thought the change is fine. However, thinking about this
> again,
> > that means the clients implementation needs to remember the ApiVersions
> of
> > each broker, which may have some complexity. Also this seems introduces
> the
> > behavior dependency between different requests, which seems unnecessary.
> >
> > Due to the above reasons, I think it might be better to bump up all the
> > request versions.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee <[email protected]>
> wrote:
> >
> > > Hello,
> > >
> > > Per Ismael's suggestion, I'd like to get comments from the original
> > voters
> > > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
> > change
> > > proposed in the discussion thread (
> > > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
> > > Would you let me know?
> > >
> > > Thanks,
> > > Jon
> > >
> > >
> > > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin <[email protected]> wrote:
> > >
> > > > +Jon
> > > >
> > > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <[email protected]>
> > wrote:
> > > >
> > > >> Thanks for the discussion and voting.
> > > >>
> > > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
> > > >>
> > > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
> > > [email protected]>
> > > >> wrote:
> > > >>
> > > >> > Hi Becket,
> > > >> >
> > > >> > Thanks for the update. Yes, it does address my concern.
> > > >> >
> > > >> > +1 (binding)
> > > >> >
> > > >> > Regards,
> > > >> >
> > > >> > Rajini
> > > >> >
> > > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <[email protected]
> >
> > > >> wrote:
> > > >> >
> > > >> > > Actually returning an empty fetch request may still be useful to
> > > >> reduce
> > > >> > the
> > > >> > > throttle time due to request quota violation because the
> > > FetchResponse
> > > >> > send
> > > >> > > time will be less. I just updated the KIP.
> > > >> > >
> > > >> > > Rajini, does that address your concern?
> > > >> > >
> > > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <
> [email protected]
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Thanks for the reply, Jun.
> > > >> > > >
> > > >> > > > Currently the byte rate quota does not apply to
> > HeartbeatRequest,
> > > >> > > > JoinGroupRequest/SyncGroupRequest. So the only case those
> > > requests
> > > >> are
> > > >> > > > throttled is because the request quota is violated. In that
> > case,
> > > >> the
> > > >> > > > throttle time does not really matter whether we return a full
> > > >> > > FetchResponse
> > > >> > > > or an empty one. Would it be more consistent if we throttle
> > based
> > > on
> > > >> > the
> > > >> > > > actual throttle time / channel mute time?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > > > Jiangjie (Becket) Qin
> > > >> > > >
> > > >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <[email protected]>
> > > wrote:
> > > >> > > >
> > > >> > > >> Hi, Jiangjie,
> > > >> > > >>
> > > >> > > >> You are right that the heartbeat is done in a channel
> different
> > > >> from
> > > >> > the
> > > >> > > >> fetch request. I think it's still useful to return an empty
> > fetch
> > > >> > > response
> > > >> > > >> when the quota is violated. This way, the throttle time for
> the
> > > >> > > heartbeat
> > > >> > > >> request won't be large. I agree that we can just mute the
> > channel
> > > >> for
> > > >> > > the
> > > >> > > >> fetch request for the throttle time computed based on a full
> > > fetch
> > > >> > > >> response. This probably also partially addresses Rajini's #1
> > > >> concern.
> > > >> > > >>
> > > >> > > >> Thanks,
> > > >> > > >>
> > > >> > > >> Jun
> > > >> > > >>
> > > >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <
> > > [email protected]>
> > > >> > > wrote:
> > > >> > > >>
> > > >> > > >> > Hi Rajini,
> > > >> > > >> >
> > > >> > > >> > Thanks for the comments. Pleas see the reply inline.
> > > >> > > >> >
> > > >> > > >> > Hi Jun,
> > > >> > > >> >
> > > >> > > >> > Thinking about the consumer rebalance case a bit more, I am
> > not
> > > >> sure
> > > >> > > if
> > > >> > > >> we
> > > >> > > >> > need to worry about the delayed rebalance due to quota
> > > violation
> > > >> or
> > > >> > > not.
> > > >> > > >> > The rebalance actually uses a separate channel to
> > coordinator.
> > > >> > > Therefore
> > > >> > > >> > unless the request quota is hit, the rebalance won't be
> > > >> throttled.
> > > >> > > Even
> > > >> > > >> if
> > > >> > > >> > request quota is hit, it seems unlikely to be delayed long.
> > So
> > > it
> > > >> > > looks
> > > >> > > >> > that we don't need to unmute the channel earlier than
> needed.
> > > >> Does
> > > >> > > this
> > > >> > > >> > address your concern?
> > > >> > > >> >
> > > >> > > >> > Thanks,
> > > >> > > >> >
> > > >> > > >> > Jiangjie (Becket) Qin
> > > >> > > >> >
> > > >> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> > > >> > > >> [email protected]>
> > > >> > > >> > wrote:
> > > >> > > >> >
> > > >> > > >> > > Hi Becket,
> > > >> > > >> > >
> > > >> > > >> > > A few questions:
> > > >> > > >> > >
> > > >> > > >> > > 1. KIP says: *Although older client implementations
> (prior
> > to
> > > >> > > >> knowledge
> > > >> > > >> > of
> > > >> > > >> > > this KIP) will immediately send the next request after
> the
> > > >> broker
> > > >> > > >> > responds
> > > >> > > >> > > without paying attention to the throttle time field, the
> > > >> broker is
> > > >> > > >> > > protected by virtue of muting the channel for time X.
> i.e.,
> > > the
> > > >> > next
> > > >> > > >> > > request will not be processed until the channel is
> > unmuted. *
> > > >> > > >> > > For fetch requests, the response is sent immediately and
> > the
> > > >> mute
> > > >> > > >> time on
> > > >> > > >> > > the broker based on empty fetch response will often be
> zero
> > > >> > (unlike
> > > >> > > >> the
> > > >> > > >> > > throttle time returned to the client based on non-empty
> > > >> response).
> > > >> > > >> Won't
> > > >> > > >> > > that lead to a tight loop of fetch requests from old
> > clients
> > > >> > > >> > (particularly
> > > >> > > >> > > expensive with SSL)? Wouldn't it be better to retain
> > current
> > > >> > > behaviour
> > > >> > > >> > for
> > > >> > > >> > > old clients? Also, if we change the behaviour for old
> > > clients,
> > > >> > > client
> > > >> > > >> > > metrics that track throttle time will report a lot of
> > > throttle
> > > >> > > >> unrelated
> > > >> > > >> > to
> > > >> > > >> > >  actual throttle time.
> > > >> > > >> > >
> > > >> > > >> > For consumers, if quota is violated, the throttle time on
> the
> > > >> broker
> > > >> > > >> will
> > > >> > > >> > not be 0. It is just that the throttle time will not be
> > > >> increasing
> > > >> > > >> because
> > > >> > > >> > the consumer will return an empty response in this case. So
> > > there
> > > >> > > should
> > > >> > > >> > not be a tight loop.
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> > > 2. KIP says: *The usual idle timeout i.e.,
> > > >> > connections.max.idle.ms
> > > >> > > >> > > <http://connections.max.idle.ms> will still be honored
> > > during
> > > >> the
> > > >> > > >> > throttle
> > > >> > > >> > > time X. This makes sure that the brokers will detect
> client
> > > >> > > connection
> > > >> > > >> > > closure in a bounded time.*
> > > >> > > >> > > Wouldn't it be better to bound maximum throttle time to
> > > >> > > >> > > *connections.max.idle.ms
> > > >> > > >> > > <http://connections.max.idle.ms>*? If we mute for a time
> > > >> greater
> > > >> > > than
> > > >> > > >> > > *connections.max.idle.ms
> > > >> > > >> > > <http://connections.max.idle.ms>* and then close a
> client
> > > >> > > connection
> > > >> > > >> > > simply
> > > >> > > >> > > because we had muted it on the broker for a longer
> throttle
> > > >> time,
> > > >> > we
> > > >> > > >> > force
> > > >> > > >> > > a reconnection and read the next request from the new
> > > >> connection
> > > >> > > >> straight
> > > >> > > >> > > away. This feels the same as a bound on the quota of *
> > > >> > > >> > > connections.max.idle.ms
> > > >> > > >> > > <http://connections.max.idle.ms>*, but with added load
> on
> > > the
> > > >> > > broker
> > > >> > > >> for
> > > >> > > >> > > authenticating another connection (expensive with SSL).
> > > >> > > >> > >
> > > >> > > >> > I think we need to think about the consumer prior to and
> > after
> > > >> this
> > > >> > > KIP
> > > >> > > >> > separately.
> > > >> > > >> >
> > > >> > > >> > For consumer prior to this KIP, even if we unmute the
> channel
> > > >> after
> > > >> > > >> > connection.max.idle.ms, it is likely that the consumers
> have
> > > >> > already
> > > >> > > >> > reached request.timeout.ms before that and has reconnected
> > to
> > > >> the
> > > >> > > >> broker.
> > > >> > > >> > So there is no real difference whether we close the
> throttled
> > > >> > channel
> > > >> > > or
> > > >> > > >> > not.
> > > >> > > >> >
> > > >> > > >> > For consumers after the KIP, because they will honor the
> > > throttle
> > > >> > > time,
> > > >> > > >> > they will back off until throttle time is reached. If that
> > > >> throttle
> > > >> > > >> time is
> > > >> > > >> > longer than connection.max.idle.ms, it seems not a big
> > > overhead
> > > >> > > because
> > > >> > > >> > there will only be one connection re-establishment in
> quite a
> > > few
> > > >> > > >> minutes.
> > > >> > > >> > Compared with such overhead, it seems more important to
> honor
> > > the
> > > >> > > quota
> > > >> > > >> so
> > > >> > > >> > the broker can survive.
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> > > 3. Are we changing the behaviour of network bandwidth
> quota
> > > for
> > > >> > > >> > > Produce/Fetch and retaining current behaviour for request
> > > >> quotas?
> > > >> > > >> > >
> > > >> > > >> > This is going to be applied to all the quota. Including
> > request
> > > >> > > quotas.
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> > >
> > > >> > > >> > > Thanks,
> > > >> > > >> > >
> > > >> > > >> > > Rajini
> > > >> > > >> > >
> > > >> > > >> > >
> > > >> > > >> > >
> > > >> > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <
> > [email protected]>
> > > >> > wrote:
> > > >> > > >> > >
> > > >> > > >> > > > Hi, Jiangjie,
> > > >> > > >> > > >
> > > >> > > >> > > > Thanks for the updated KIP. +1
> > > >> > > >> > > >
> > > >> > > >> > > > Jun
> > > >> > > >> > > >
> > > >> > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <
> > > >> > [email protected]>
> > > >> > > >> > wrote:
> > > >> > > >> > > >
> > > >> > > >> > > > > Thanks for the comments, Jun.
> > > >> > > >> > > > >
> > > >> > > >> > > > > 1. Good point.
> > > >> > > >> > > > > 2. Also makes sense. Usually the
> > connection.max.idle.ms
> > > is
> > > >> > high
> > > >> > > >> > enough
> > > >> > > >> > > > so
> > > >> > > >> > > > > the throttling is impacted.
> > > >> > > >> > > > >
> > > >> > > >> > > > > I have updated the KIP to reflect the changes.
> > > >> > > >> > > > >
> > > >> > > >> > > > > Thanks,
> > > >> > > >> > > > >
> > > >> > > >> > > > > Jiangjie (Becket) Qin
> > > >> > > >> > > > >
> > > >> > > >> > > > >
> > > >> > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <
> > > [email protected]>
> > > >> > > wrote:
> > > >> > > >> > > > >
> > > >> > > >> > > > > > Hi, Jiangjie,
> > > >> > > >> > > > > >
> > > >> > > >> > > > > > Sorry for the late response. The proposal sounds
> good
> > > >> > > overall. A
> > > >> > > >> > > couple
> > > >> > > >> > > > > of
> > > >> > > >> > > > > > minor comments below.
> > > >> > > >> > > > > >
> > > >> > > >> > > > > > 1. For throttling a fetch request, we could
> > potentially
> > > >> just
> > > >> > > >> send
> > > >> > > >> > an
> > > >> > > >> > > > > empty
> > > >> > > >> > > > > > response. We can return a throttle time calculated
> > > from a
> > > >> > full
> > > >> > > >> > > > response,
> > > >> > > >> > > > > > but only mute the channel on the server based on a
> > > >> throttle
> > > >> > > time
> > > >> > > >> > > > > calculated
> > > >> > > >> > > > > > based on the empty response. This has the benefit
> > that
> > > >> the
> > > >> > > >> server
> > > >> > > >> > > will
> > > >> > > >> > > > > mute
> > > >> > > >> > > > > > the channel much shorter, which will prevent the
> > > consumer
> > > >> > from
> > > >> > > >> > > > > rebalancing
> > > >> > > >> > > > > > when throttled.
> > > >> > > >> > > > > >
> > > >> > > >> > > > > > 2. The wiki says "connections.max.idle.ms should
> be
> > > >> ignored
> > > >> > > >> during
> > > >> > > >> > > the
> > > >> > > >> > > > > > throttle time X." This has the potential issue
> that a
> > > >> server
> > > >> > > may
> > > >> > > >> > not
> > > >> > > >> > > > > detect
> > > >> > > >> > > > > > that a client connection is already gone until
> after
> > an
> > > >> > > >> arbitrary
> > > >> > > >> > > > amount
> > > >> > > >> > > > > of
> > > >> > > >> > > > > > time. Perhaps we could still just close a
> connection
> > if
> > > >> the
> > > >> > > >> server
> > > >> > > >> > > has
> > > >> > > >> > > > > > muted it for longer than connections.max.idle.ms.
> > This
> > > >> will
> > > >> > > at
> > > >> > > >> > least
> > > >> > > >> > > > > bound
> > > >> > > >> > > > > > the time for a server to detect closed client
> > > >> connections.
> > > >> > > >> > > > > >
> > > >> > > >> > > > > > Thanks,
> > > >> > > >> > > > > >
> > > >> > > >> > > > > > Jun
> > > >> > > >> > > > > >
> > > >> > > >> > > > > >
> > > >> > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
> > > >> > > >> [email protected]>
> > > >> > > >> > > > > wrote:
> > > >> > > >> > > > > >
> > > >> > > >> > > > > > > Hi,
> > > >> > > >> > > > > > >
> > > >> > > >> > > > > > > We would like to start the voting thread for
> > KIP-219.
> > > >> The
> > > >> > > KIP
> > > >> > > >> > > > proposes
> > > >> > > >> > > > > to
> > > >> > > >> > > > > > > improve the quota communication between the
> brokers
> > > and
> > > >> > > >> clients,
> > > >> > > >> > > > > > especially
> > > >> > > >> > > > > > > for cases of long throttling time.
> > > >> > > >> > > > > > >
> > > >> > > >> > > > > > > The KIP wiki is following:
> > > >> > > >> > > > > > > https://cwiki.apache.org/
> > > confluence/display/KAFKA/KIP-
> > > >> > > >> > > > > > 219+-+Improve+quota+
> > > >> > > >> > > > > > > communication
> > > >> > > >> > > > > > >
> > > >> > > >> > > > > > > The discussion thread is here:
> > > >> > > >> > > > > > > http://markmail.org/search/?q=
> > > >> kafka+KIP-219#query:kafka%
> > > >> > > >> > > > > > > 20KIP-219+page:1+mid:
> > ooxabguy7nz7l7zy+state:results
> > > >> > > >> > > > > > >
> > > >> > > >> > > > > > > Thanks,
> > > >> > > >> > > > > > >
> > > >> > > >> > > > > > > Jiangjie (Becket) Qin
> > > >> > > >> > > > > > >
> > > >> > > >> > > > > >
> > > >> > > >> > > > >
> > > >> > > >> > > >
> > > >> > > >> > >
> > > >> > > >> >
> > > >> > > >>
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Reply via email to