Hi, Jiangjie,

You are right that the heartbeat is done in a channel different from the
fetch request. I think it's still useful to return an empty fetch response
when the quota is violated. This way, the throttle time for the heartbeat
request won't be large. I agree that we can just mute the channel for the
fetch request for the throttle time computed based on a full fetch
response. This probably also partially addresses Rajini's #1 concern.

Thanks,

Jun

On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <becket....@gmail.com> wrote:

> Hi Rajini,
>
> Thanks for the comments. Pleas see the reply inline.
>
> Hi Jun,
>
> Thinking about the consumer rebalance case a bit more, I am not sure if we
> need to worry about the delayed rebalance due to quota violation or not.
> The rebalance actually uses a separate channel to coordinator. Therefore
> unless the request quota is hit, the rebalance won't be throttled. Even if
> request quota is hit, it seems unlikely to be delayed long. So it looks
> that we don't need to unmute the channel earlier than needed. Does this
> address your concern?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
> > Hi Becket,
> >
> > A few questions:
> >
> > 1. KIP says: *Although older client implementations (prior to knowledge
> of
> > this KIP) will immediately send the next request after the broker
> responds
> > without paying attention to the throttle time field, the broker is
> > protected by virtue of muting the channel for time X. i.e., the next
> > request will not be processed until the channel is unmuted. *
> > For fetch requests, the response is sent immediately and the mute time on
> > the broker based on empty fetch response will often be zero (unlike the
> > throttle time returned to the client based on non-empty response). Won't
> > that lead to a tight loop of fetch requests from old clients
> (particularly
> > expensive with SSL)? Wouldn't it be better to retain current behaviour
> for
> > old clients? Also, if we change the behaviour for old clients, client
> > metrics that track throttle time will report a lot of throttle unrelated
> to
> >  actual throttle time.
> >
> For consumers, if quota is violated, the throttle time on the broker will
> not be 0. It is just that the throttle time will not be increasing because
> the consumer will return an empty response in this case. So there should
> not be a tight loop.
>
>
> > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> > <http://connections.max.idle.ms> will still be honored during the
> throttle
> > time X. This makes sure that the brokers will detect client connection
> > closure in a bounded time.*
> > Wouldn't it be better to bound maximum throttle time to
> > *connections.max.idle.ms
> > <http://connections.max.idle.ms>*? If we mute for a time greater than
> > *connections.max.idle.ms
> > <http://connections.max.idle.ms>* and then close a client connection
> > simply
> > because we had muted it on the broker for a longer throttle time, we
> force
> > a reconnection and read the next request from the new connection straight
> > away. This feels the same as a bound on the quota of *
> > connections.max.idle.ms
> > <http://connections.max.idle.ms>*, but with added load on the broker for
> > authenticating another connection (expensive with SSL).
> >
> I think we need to think about the consumer prior to and after this KIP
> separately.
>
> For consumer prior to this KIP, even if we unmute the channel after
> connection.max.idle.ms, it is likely that the consumers have already
> reached request.timeout.ms before that and has reconnected to the broker.
> So there is no real difference whether we close the throttled channel or
> not.
>
> For consumers after the KIP, because they will honor the throttle time,
> they will back off until throttle time is reached. If that throttle time is
> longer than connection.max.idle.ms, it seems not a big overhead because
> there will only be one connection re-establishment in quite a few minutes.
> Compared with such overhead, it seems more important to honor the quota so
> the broker can survive.
>
>
> > 3. Are we changing the behaviour of network bandwidth quota for
> > Produce/Fetch and retaining current behaviour for request quotas?
> >
> This is going to be applied to all the quota. Including request quotas.
>
>
> >
> > Thanks,
> >
> > Rajini
> >
> >
> >
> > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > Thanks for the updated KIP. +1
> > >
> > > Jun
> > >
> > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <becket....@gmail.com>
> wrote:
> > >
> > > > Thanks for the comments, Jun.
> > > >
> > > > 1. Good point.
> > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
> enough
> > > so
> > > > the throttling is impacted.
> > > >
> > > > I have updated the KIP to reflect the changes.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Hi, Jiangjie,
> > > > >
> > > > > Sorry for the late response. The proposal sounds good overall. A
> > couple
> > > > of
> > > > > minor comments below.
> > > > >
> > > > > 1. For throttling a fetch request, we could potentially just send
> an
> > > > empty
> > > > > response. We can return a throttle time calculated from a full
> > > response,
> > > > > but only mute the channel on the server based on a throttle time
> > > > calculated
> > > > > based on the empty response. This has the benefit that the server
> > will
> > > > mute
> > > > > the channel much shorter, which will prevent the consumer from
> > > > rebalancing
> > > > > when throttled.
> > > > >
> > > > > 2. The wiki says "connections.max.idle.ms should be ignored during
> > the
> > > > > throttle time X." This has the potential issue that a server may
> not
> > > > detect
> > > > > that a client connection is already gone until after an arbitrary
> > > amount
> > > > of
> > > > > time. Perhaps we could still just close a connection if the server
> > has
> > > > > muted it for longer than connections.max.idle.ms. This will at
> least
> > > > bound
> > > > > the time for a server to detect closed client connections.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <becket....@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We would like to start the voting thread for KIP-219. The KIP
> > > proposes
> > > > to
> > > > > > improve the quota communication between the brokers and clients,
> > > > > especially
> > > > > > for cases of long throttling time.
> > > > > >
> > > > > > The KIP wiki is following:
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 219+-+Improve+quota+
> > > > > > communication
> > > > > >
> > > > > > The discussion thread is here:
> > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to