Re: [VOTE] KIP-219 - Improve Quota Communication

Jonghyun Lee Wed, 25 Apr 2018 15:20:15 -0700

Hello,

Per Ismael's suggestion, I'd like to get comments from the original voters
for KIP-219 (Becket, Jun, Rajini) and others about the new interface change
proposed in the discussion thread (
https://markmail.org/search/?q=kafka+KIP-219#query:kafka%20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
Would you let me know?


Thanks,
Jon


On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin <[email protected]> wrote:

> +Jon
>
> On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <[email protected]> wrote:
>
>> Thanks for the discussion and voting.
>>
>> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
>>
>> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <[email protected]>
>> wrote:
>>
>> > Hi Becket,
>> >
>> > Thanks for the update. Yes, it does address my concern.
>> >
>> > +1 (binding)
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <[email protected]>
>> wrote:
>> >
>> > > Actually returning an empty fetch request may still be useful to
>> reduce
>> > the
>> > > throttle time due to request quota violation because the FetchResponse
>> > send
>> > > time will be less. I just updated the KIP.
>> > >
>> > > Rajini, does that address your concern?
>> > >
>> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <[email protected]>
>> > wrote:
>> > >
>> > > > Thanks for the reply, Jun.
>> > > >
>> > > > Currently the byte rate quota does not apply to HeartbeatRequest,
>> > > > JoinGroupRequest/SyncGroupRequest. So the only case those requests
>> are
>> > > > throttled is because the request quota is violated. In that case,
>> the
>> > > > throttle time does not really matter whether we return a full
>> > > FetchResponse
>> > > > or an empty one. Would it be more consistent if we throttle based on
>> > the
>> > > > actual throttle time / channel mute time?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <[email protected]> wrote:
>> > > >
>> > > >> Hi, Jiangjie,
>> > > >>
>> > > >> You are right that the heartbeat is done in a channel different
>> from
>> > the
>> > > >> fetch request. I think it's still useful to return an empty fetch
>> > > response
>> > > >> when the quota is violated. This way, the throttle time for the
>> > > heartbeat
>> > > >> request won't be large. I agree that we can just mute the channel
>> for
>> > > the
>> > > >> fetch request for the throttle time computed based on a full fetch
>> > > >> response. This probably also partially addresses Rajini's #1
>> concern.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Jun
>> > > >>
>> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <[email protected]>
>> > > wrote:
>> > > >>
>> > > >> > Hi Rajini,
>> > > >> >
>> > > >> > Thanks for the comments. Pleas see the reply inline.
>> > > >> >
>> > > >> > Hi Jun,
>> > > >> >
>> > > >> > Thinking about the consumer rebalance case a bit more, I am not
>> sure
>> > > if
>> > > >> we
>> > > >> > need to worry about the delayed rebalance due to quota violation
>> or
>> > > not.
>> > > >> > The rebalance actually uses a separate channel to coordinator.
>> > > Therefore
>> > > >> > unless the request quota is hit, the rebalance won't be
>> throttled.
>> > > Even
>> > > >> if
>> > > >> > request quota is hit, it seems unlikely to be delayed long. So it
>> > > looks
>> > > >> > that we don't need to unmute the channel earlier than needed.
>> Does
>> > > this
>> > > >> > address your concern?
>> > > >> >
>> > > >> > Thanks,
>> > > >> >
>> > > >> > Jiangjie (Becket) Qin
>> > > >> >
>> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
>> > > >> [email protected]>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Hi Becket,
>> > > >> > >
>> > > >> > > A few questions:
>> > > >> > >
>> > > >> > > 1. KIP says: *Although older client implementations (prior to
>> > > >> knowledge
>> > > >> > of
>> > > >> > > this KIP) will immediately send the next request after the
>> broker
>> > > >> > responds
>> > > >> > > without paying attention to the throttle time field, the
>> broker is
>> > > >> > > protected by virtue of muting the channel for time X. i.e., the
>> > next
>> > > >> > > request will not be processed until the channel is unmuted. *
>> > > >> > > For fetch requests, the response is sent immediately and the
>> mute
>> > > >> time on
>> > > >> > > the broker based on empty fetch response will often be zero
>> > (unlike
>> > > >> the
>> > > >> > > throttle time returned to the client based on non-empty
>> response).
>> > > >> Won't
>> > > >> > > that lead to a tight loop of fetch requests from old clients
>> > > >> > (particularly
>> > > >> > > expensive with SSL)? Wouldn't it be better to retain current
>> > > behaviour
>> > > >> > for
>> > > >> > > old clients? Also, if we change the behaviour for old clients,
>> > > client
>> > > >> > > metrics that track throttle time will report a lot of throttle
>> > > >> unrelated
>> > > >> > to
>> > > >> > >  actual throttle time.
>> > > >> > >
>> > > >> > For consumers, if quota is violated, the throttle time on the
>> broker
>> > > >> will
>> > > >> > not be 0. It is just that the throttle time will not be
>> increasing
>> > > >> because
>> > > >> > the consumer will return an empty response in this case. So there
>> > > should
>> > > >> > not be a tight loop.
>> > > >> >
>> > > >> >
>> > > >> > > 2. KIP says: *The usual idle timeout i.e.,
>> > connections.max.idle.ms
>> > > >> > > <http://connections.max.idle.ms> will still be honored during
>> the
>> > > >> > throttle
>> > > >> > > time X. This makes sure that the brokers will detect client
>> > > connection
>> > > >> > > closure in a bounded time.*
>> > > >> > > Wouldn't it be better to bound maximum throttle time to
>> > > >> > > *connections.max.idle.ms
>> > > >> > > <http://connections.max.idle.ms>*? If we mute for a time
>> greater
>> > > than
>> > > >> > > *connections.max.idle.ms
>> > > >> > > <http://connections.max.idle.ms>* and then close a client
>> > > connection
>> > > >> > > simply
>> > > >> > > because we had muted it on the broker for a longer throttle
>> time,
>> > we
>> > > >> > force
>> > > >> > > a reconnection and read the next request from the new
>> connection
>> > > >> straight
>> > > >> > > away. This feels the same as a bound on the quota of *
>> > > >> > > connections.max.idle.ms
>> > > >> > > <http://connections.max.idle.ms>*, but with added load on the
>> > > broker
>> > > >> for
>> > > >> > > authenticating another connection (expensive with SSL).
>> > > >> > >
>> > > >> > I think we need to think about the consumer prior to and after
>> this
>> > > KIP
>> > > >> > separately.
>> > > >> >
>> > > >> > For consumer prior to this KIP, even if we unmute the channel
>> after
>> > > >> > connection.max.idle.ms, it is likely that the consumers have
>> > already
>> > > >> > reached request.timeout.ms before that and has reconnected to
>> the
>> > > >> broker.
>> > > >> > So there is no real difference whether we close the throttled
>> > channel
>> > > or
>> > > >> > not.
>> > > >> >
>> > > >> > For consumers after the KIP, because they will honor the throttle
>> > > time,
>> > > >> > they will back off until throttle time is reached. If that
>> throttle
>> > > >> time is
>> > > >> > longer than connection.max.idle.ms, it seems not a big overhead
>> > > because
>> > > >> > there will only be one connection re-establishment in quite a few
>> > > >> minutes.
>> > > >> > Compared with such overhead, it seems more important to honor the
>> > > quota
>> > > >> so
>> > > >> > the broker can survive.
>> > > >> >
>> > > >> >
>> > > >> > > 3. Are we changing the behaviour of network bandwidth quota for
>> > > >> > > Produce/Fetch and retaining current behaviour for request
>> quotas?
>> > > >> > >
>> > > >> > This is going to be applied to all the quota. Including request
>> > > quotas.
>> > > >> >
>> > > >> >
>> > > >> > >
>> > > >> > > Thanks,
>> > > >> > >
>> > > >> > > Rajini
>> > > >> > >
>> > > >> > >
>> > > >> > >
>> > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <[email protected]>
>> > wrote:
>> > > >> > >
>> > > >> > > > Hi, Jiangjie,
>> > > >> > > >
>> > > >> > > > Thanks for the updated KIP. +1
>> > > >> > > >
>> > > >> > > > Jun
>> > > >> > > >
>> > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <
>> > [email protected]>
>> > > >> > wrote:
>> > > >> > > >
>> > > >> > > > > Thanks for the comments, Jun.
>> > > >> > > > >
>> > > >> > > > > 1. Good point.
>> > > >> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is
>> > high
>> > > >> > enough
>> > > >> > > > so
>> > > >> > > > > the throttling is impacted.
>> > > >> > > > >
>> > > >> > > > > I have updated the KIP to reflect the changes.
>> > > >> > > > >
>> > > >> > > > > Thanks,
>> > > >> > > > >
>> > > >> > > > > Jiangjie (Becket) Qin
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <[email protected]>
>> > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Hi, Jiangjie,
>> > > >> > > > > >
>> > > >> > > > > > Sorry for the late response. The proposal sounds good
>> > > overall. A
>> > > >> > > couple
>> > > >> > > > > of
>> > > >> > > > > > minor comments below.
>> > > >> > > > > >
>> > > >> > > > > > 1. For throttling a fetch request, we could potentially
>> just
>> > > >> send
>> > > >> > an
>> > > >> > > > > empty
>> > > >> > > > > > response. We can return a throttle time calculated from a
>> > full
>> > > >> > > > response,
>> > > >> > > > > > but only mute the channel on the server based on a
>> throttle
>> > > time
>> > > >> > > > > calculated
>> > > >> > > > > > based on the empty response. This has the benefit that
>> the
>> > > >> server
>> > > >> > > will
>> > > >> > > > > mute
>> > > >> > > > > > the channel much shorter, which will prevent the consumer
>> > from
>> > > >> > > > > rebalancing
>> > > >> > > > > > when throttled.
>> > > >> > > > > >
>> > > >> > > > > > 2. The wiki says "connections.max.idle.ms should be
>> ignored
>> > > >> during
>> > > >> > > the
>> > > >> > > > > > throttle time X." This has the potential issue that a
>> server
>> > > may
>> > > >> > not
>> > > >> > > > > detect
>> > > >> > > > > > that a client connection is already gone until after an
>> > > >> arbitrary
>> > > >> > > > amount
>> > > >> > > > > of
>> > > >> > > > > > time. Perhaps we could still just close a connection if
>> the
>> > > >> server
>> > > >> > > has
>> > > >> > > > > > muted it for longer than connections.max.idle.ms. This
>> will
>> > > at
>> > > >> > least
>> > > >> > > > > bound
>> > > >> > > > > > the time for a server to detect closed client
>> connections.
>> > > >> > > > > >
>> > > >> > > > > > Thanks,
>> > > >> > > > > >
>> > > >> > > > > > Jun
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
>> > > >> [email protected]>
>> > > >> > > > > wrote:
>> > > >> > > > > >
>> > > >> > > > > > > Hi,
>> > > >> > > > > > >
>> > > >> > > > > > > We would like to start the voting thread for KIP-219.
>> The
>> > > KIP
>> > > >> > > > proposes
>> > > >> > > > > to
>> > > >> > > > > > > improve the quota communication between the brokers and
>> > > >> clients,
>> > > >> > > > > > especially
>> > > >> > > > > > > for cases of long throttling time.
>> > > >> > > > > > >
>> > > >> > > > > > > The KIP wiki is following:
>> > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > >> > > > > > 219+-+Improve+quota+
>> > > >> > > > > > > communication
>> > > >> > > > > > >
>> > > >> > > > > > > The discussion thread is here:
>> > > >> > > > > > > http://markmail.org/search/?q=
>> kafka+KIP-219#query:kafka%
>> > > >> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
>> > > >> > > > > > >
>> > > >> > > > > > > Thanks,
>> > > >> > > > > > >
>> > > >> > > > > > > Jiangjie (Becket) Qin
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Reply via email to