In either approach I'm not sure we considered being able to turn it
off completely. IOW, no it is not a "plugin" if that's what you are
asking. We can set very high defaults by default and in the absence of
any overrides it would effectively be off. The quota enforcement is
actually already part of the metrics package. The new code (that
exercises it) will be added to wherever the metrics are being
measured.

Thanks,

Joel

On Tue, Apr 21, 2015 at 03:04:07PM -0400, Tong Li wrote:
> 
> Joel,
>       Nice write up. Couple of questions, not sure if they have been
> answered. Since we will have a call later today, I would like to ask here
> as well so that we can talk about if not responded in email discussion.
> 
>       1. Where the new code will be plugged in, that is, where is the
> plugin point, KafkaApi?
>       2. Can this quota control be disabled/enabled without affect anything
> else? From the design wiki page, it seems to me that each request will at
> least pay a penalty of checking quota enablement.
> 
> Thanks.
> 
> Tong Li
> OpenStack & Kafka Community Development
> Building 501/B205
> liton...@us.ibm.com
> 
> 
> 
> From: Joel Koshy <jjkosh...@gmail.com>
> To:   dev@kafka.apache.org
> Date: 04/21/2015 01:22 PM
> Subject:      Re: [KIP-DISCUSSION] KIP-13 Quotas
> 
> 
> 
> Given the caveats, it may be worth doing further investigation on the
> alternate approach which is to use a dedicated DelayQueue for requests
> that violate quota and compare pros/cons.
> 
> So the approach is the following: all request handling occurs normally
> (i.e., unchanged from what we do today). i.e., purgatories will be
> unchanged.  After handling a request and before sending the response,
> check if the request has violated a quota. If so, then enqueue the
> response into a DelayQueue. All responses can share the same
> DelayQueue. Send those responses out after the delay has been met.
> 
> There are some benefits to doing this:
> 
> - We will eventually want to quota other requests as well. The above
>   seems to be a clean staged approach that should work uniformly for
>   all requests. i.e., parse request -> handle request normally ->
>   check quota -> hold in delay queue if quota violated -> respond .
>   All requests can share the same DelayQueue. (In contrast with the
>   current proposal we could end up with a bunch of purgatories, or a
>   combination of purgatories and delay queues.)
> - Since this approach does not need any fundamental modifications to
>   the current request handling, it addresses the caveats that Adi
>   noted (which is holding producer requests/fetch requests longer than
>   strictly necessary if quota is violated since the proposal was to
>   not watch on keys in that case). Likewise it addresses the caveat
>   that Guozhang noted (we may return no error if the request is held
>   long enough due to quota violation and satisfy a producer request
>   that may have in fact exceeded the ack timeout) although it is
>   probably reasonable to hide this case from the user.
> - By avoiding the caveats it also avoids the suggested work-around to
>   the caveats which is effectively to add a min-hold-time to the
>   purgatory. Although this is not a lot of code, I think it adds a
>   quota-driven feature to the purgatory which is already non-trivial
>   and should ideally remain unassociated with quota enforcement.
> 
> For this to work well we need to be sure that we don't hold a lot of
> data in the DelayQueue - and therein lies a quirk to this approach.
> Producer responses (and most other responses) are very small so there
> is no issue. Fetch responses are fine as well - since we read off a
> FileMessageSet in response (zero-copy). This will remain true even
> when we support SSL since encryption occurs at the session layer (not
> the application layer).
> 
> Topic metadata response can be a problem though. For this we ideally
> want to build the topic metadata response only when we are ready to
> respond. So for metadata-style responses which could contain large
> response objects we may want to put the quota check and delay queue
> _before_ handling the request. So the design in this approach would
> need an amendment: provide a choice of where to put a request in the
> delay queue: either before handling or after handling (before
> response). So for:
> 
> small request, large response: delay queue before handling
> large request, small response: delay queue after handling, before response
> small request, small response: either is fine
> large request, large resopnse: we really cannot do anything here but we
> don't really have this scenario yet
> 
> So the design would look like this:
> 
> - parse request
> - before handling request check if quota violated; if so compute two delay
> numbers:
>   - before handling delay
>   - before response delay
> - if before-handling delay > 0 insert into before-handling delay queue
> - handle the request
> - if before-response delay > 0 insert into before-response delay queue
> - respond
> 
> Just throwing this out there for discussion.
> 
> Thanks,
> 
> Joel
> 
> On Thu, Apr 16, 2015 at 02:56:23PM -0700, Jun Rao wrote:
> > The quota check for the fetch request is a bit different from the produce
> > request. I assume that for the fetch request, we will first get an
> > estimated fetch response size to do the quota check. There are two things
> > to think about. First, when we actually send the response, we probably
> > don't want to record the metric again since it will double count. Second,
> > the bytes that the fetch response actually sends could be more than the
> > estimate. This means that the metric may not be 100% accurate. We may be
> > able to limit the fetch size of each partition to what's in the original
> > estimate.
> >
> > For the produce request, I was thinking that another way to do this is to
> > first figure out the quota_timeout. Then wait in Purgatory for
> > quota_timeout with no key. If the request is not satisfied in
> quota_timeout
> > and (request_timeout > quota_timeout), wait in Purgatory for
> > (request_timeout - quota_timeout) with the original keys.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Apr 14, 2015 at 5:01 PM, Aditya Auradkar <
> > aaurad...@linkedin.com.invalid> wrote:
> >
> > > This is an implementation proposal for delaying requests in quotas
> using
> > > the current purgatory. I'll discuss the usage for produce and fetch
> > > requests separately.
> > >
> > > 1. Delayed Produce Requests - Here, the proposal is basically to reuse
> > > DelayedProduce objects and insert them into the purgatory with no
> watcher
> > > keys if the request is being throttled. The timeout used in the request
> > > should be the Max(quota_delay_time, replication_timeout).
> > > In most cases, the quota timeout should be greater than the existing
> > > timeout but in order to be safe, we can use the maximum of these
> values.
> > > Having no watch keys will allow the operation to be enqueued directly
> into
> > > the timer and will not add any overhead in terms of watching keys
> (which
> > > was a concern). In this case, having watch keys is not beneficial since
> the
> > > operation must be delayed for a fixed amount of time and there is no
> > > possibility for the operation to complete before the timeout i.e.
> > > tryComplete() can never return true before the timeout. On timeout,
> since
> > > the operation is a TimerTask, the timer will call run() which calls
> > > onComplete().
> > > In onComplete, the DelayedProduce can repeat the check in tryComplete()
> > > (only if acks=-1 whether all replicas fetched upto a certain offset)
> and
> > > return the response immediately.
> > >
> > > Code will be structured as follows in ReplicaManager:appendMessages()
> > >
> > > if(isThrottled) {
> > >   fetch = new DelayedProduce(timeout)
> > >   purgatory.tryCompleteElseWatch(fetch, Seq())
> > > }
> > > else if(delayedRequestRequired()) {
> > >  // Insert into purgatory with watched keys for unthrottled requests
> > > }
> > >
> > > In this proposal, we avoid adding unnecessary watches because there is
> no
> > > possibility of early completion and this avoids any potential
> performance
> > > penalties we were concerned about earlier.
> > >
> > > 2. Delayed Fetch Requests - Similarly, the proposal here is to reuse
> the
> > > DelayedFetch objects and insert them into the purgatory with no watcher
> > > keys if the request is throttled. Timeout used is the Max
> (quota_delay_time,
> > > max_wait_timeout). Having no watch keys provides the same benefits as
> > > described above. Upon timeout, the onComplete() is called and the
> operation
> > > proceeds normally i.e. perform a readFromLocalLog and return a
> response.
> > > The caveat here is that if the request is throttled but the throttle
> time
> > > is less than the max_wait timeout on the fetch request, the request
> will be
> > > delayed to a Max(quota_delay_time, max_wait_timeout). This may be more
> than
> > > strictly necessary (since we are not watching for
> > > satisfaction on any keys).
> > >
> > > I added some testcases to DelayedOperationTest to verify that it is
> > > possible to schedule operations with no watcher keys. By inserting
> elements
> > > with no watch keys, the purgatory simply becomes a delay queue. It may
> also
> > > make sense to add a new API to the purgatory called
> > > delayFor() that basically accepts an operation without any watch keys
> > > (Thanks for the suggestion Joel).
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Aditya
> > >
> > > ________________________________________
> > > From: Guozhang Wang [wangg...@gmail.com]
> > > Sent: Monday, April 13, 2015 7:27 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > >
> > > I think KAFKA-2063 (bounding fetch response) is still under discussion,
> and
> > > may not be got it in time with KAFKA-1927.
> > >
> > > On Thu, Apr 9, 2015 at 4:49 PM, Aditya Auradkar <
> > > aaurad...@linkedin.com.invalid> wrote:
> > >
> > > > I think it's reasonable to batch the protocol changes together. In
> > > > addition to the protocol changes, is someone actively driving the
> server
> > > > side changes/KIP process for KAFKA-2063?
> > > >
> > > > Thanks,
> > > > Aditya
> > > >
> > > > ________________________________________
> > > > From: Jun Rao [j...@confluent.io]
> > > > Sent: Thursday, April 09, 2015 8:59 AM
> > > > To: dev@kafka.apache.org
> > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > >
> > > > Since we are also thinking about evolving the fetch request protocol
> in
> > > > KAFKA-2063 (bound fetch response size), perhaps it's worth thinking
> > > through
> > > > if we can just evolve the protocol once.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Apr 8, 2015 at 10:43 AM, Aditya Auradkar <
> > > > aaurad...@linkedin.com.invalid> wrote:
> > > >
> > > > > Thanks for the detailed review. I've addressed your comments.
> > > > >
> > > > > For rejected alternatives, we've rejected per-partition
> distribution
> > > > > because we choose client based quotas where there is no notion of
> > > > > partitions. I've explained in a bit more detail in that section.
> > > > >
> > > > > Aditya
> > > > >
> > > > > ________________________________________
> > > > > From: Joel Koshy [jjkosh...@gmail.com]
> > > > > Sent: Wednesday, April 08, 2015 6:30 AM
> > > > > To: dev@kafka.apache.org
> > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > >
> > > > > Thanks for updating the wiki. Looks great overall. Just a couple
> > > > > more comments:
> > > > >
> > > > > Client status code:
> > > > > - v0 requests -> current version (0) of those requests.
> > > > > - Fetch response has a throttled flag instead of throttle time -  I
> > > > >   think you intended the latter.
> > > > > - Can you make it clear that the quota status is a new field
> > > > >   called throttleTimeMs (or equivalent). It would help if some of
> > > > >   that is moved (or repeated) in compatibility/migration plan.
> > > > > - So you would need to upgrade brokers first, then the clients.
> > > > >   While upgrading the brokers (via a rolling bounce) the brokers
> > > > >   cannot start using the latest fetch-request version immediately
> > > > >   (for replica fetches). Since there will be older brokers in the
> mix
> > > > >   those brokers would not be able to read v1 fetch requests. So all
> > > > >   the brokers should be upgraded before switching to the latest
> > > > >   fetch request version. This is similar to what Gwen proposed in
> > > > >   KIP-2/KAFKA-1809 and I think we will need to use the
> > > > >   inter-broker protocol version config.
> > > > >
> > > > > Rejected alternatives-quota-distribution.B: notes that this is the
> > > > > most elegant model, but does not explain why it was rejected. I
> > > > > think this was because we would then need some sort of gossip
> > > > > between brokers since partitions are across the cluster. Can you
> > > > > confirm?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Joel
> > > > >
> > > > > On Wed, Apr 08, 2015 at 05:45:34AM +0000, Aditya Auradkar wrote:
> > > > > > Hey everyone,
> > > > > >
> > > > > > Following up after today's hangout. After discussing the client
> side
> > > > > metrics piece internally, we've incorporated that section into the
> KIP.
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > >
> > > > > > Since there appears to be sufficient consensus, I'm going to
> start a
> > > > > voting thread.
> > > > > >
> > > > > > Thanks,
> > > > > > Aditya
> > > > > > ________________________________________
> > > > > > From: Gwen Shapira [gshap...@cloudera.com]
> > > > > > Sent: Tuesday, April 07, 2015 11:31 AM
> > > > > > To: Sriharsha Chintalapani
> > > > > > Cc: dev@kafka.apache.org
> > > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > >
> > > > > > Yeah, I was not suggesting adding auth to metrics - I think this
> > > > > needlessly
> > > > > > complicates everything.
> > > > > > But we need to assume that client developers will not have access
> to
> > > > the
> > > > > > broker metrics (because in secure environment they probably
> won't).
> > > > > >
> > > > > > Gwen
> > > > > >
> > > > > > On Tue, Apr 7, 2015 at 11:20 AM, Sriharsha Chintalapani <
> > > > ka...@harsha.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Having auth  on top of metrics is going to be lot more
> difficult.
> > > How
> > > > > are
> > > > > > > we going to restrict metrics reporter which run as part of
> kafka
> > > > server
> > > > > > > they will have access to all the metrics and they can publish
> to
> > > > > ganglia
> > > > > > > etc..  I look at the metrics as a read-only info. As you said
> > > metrics
> > > > > for
> > > > > > > all the topics can be visible but what actions are we looking
> that
> > > > can
> > > > > be
> > > > > > > non-secure based on metrics alone? . This probably can be part
> of
> > > > > KIP-11
> > > > > > > discussion.
> > > > > > >  Having said that it will be great if the throttling details
> can be
> > > > > > > exposed as part of the response to the client. Instead of
> looking
> > > at
> > > > > > > metrics , client can depend on the response to slow down if its
> > > being
> > > > > > > throttled.  This allows us the clients can be self-reliant
> based on
> > > > the
> > > > > > > response .
> > > > > > >
> > > > > > > --
> > > > > > > Harsha
> > > > > > >
> > > > > > >
> > > > > > > On April 7, 2015 at 9:55:41 AM, Gwen Shapira (
> > > gshap...@cloudera.com)
> > > > > > > wrote:
> > > > > > >
> > > > > > > Re (1):
> > > > > > > We have no authorization story on the metrics collected by
> brokers,
> > > > so
> > > > > I
> > > > > > > assume that access to broker metrics means knowing exactly
> which
> > > > topics
> > > > > > > exist and their throughputs. (Prath and Don, correct me if I
> got it
> > > > > > > wrong...)
> > > > > > > Secure environments will strictly control access to this
> > > information,
> > > > > so I
> > > > > > > am pretty sure the client developers will not have access to
> server
> > > > > > > metrics
> > > > > > > at all.
> > > > > > >
> > > > > > > Gwen
> > > > > > >
> > > > > > > On Tue, Apr 7, 2015 at 7:41 AM, Jay Kreps <jay.kr...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Totally. But is that the only use? What I wanted to flesh out
> was
> > > > > > > whether
> > > > > > > > the goal was:
> > > > > > > > 1. Expose throttling in the client metrics
> > > > > > > > 2. Enable programmatic response (i.e. stop sending stuff or
> > > > something
> > > > > > > like
> > > > > > > > that)
> > > > > > > >
> > > > > > > > I think I kind of understand (1) but let's get specific on
> the
> > > > > metric we
> > > > > > > > would be adding and what exactly you would expose in a
> dashboard.
> > > > For
> > > > > > > > example if the goal is just monitoring do I really want a
> boolean
> > > > > flag
> > > > > > > for
> > > > > > > > is_throttled or do I want to know how much I am being
> throttled
> > > > (i.e.
> > > > > > > > throttle_pct might indicate the percent of your request time
> that
> > > > was
> > > > > > > due
> > > > > > > > to throttling or something like that)? If I am 1% throttled
> that
> > > > may
> > > > > be
> > > > > > > > irrelevant but 99% throttled would be quite relevant? Not
> sure I
> > > > > agree,
> > > > > > > > just throwing that out there...
> > > > > > > >
> > > > > > > > For (2) the prior discussion seemed to kind of allude to this
> > > but I
> > > > > > > can't
> > > > > > > > really come up with a use case. Is there one?
> > > > > > > >
> > > > > > > > If it is just (1) I think the question is whether it really
> helps
> > > > > much
> > > > > > > to
> > > > > > > > have the metric on the client vs the server. I suppose this
> is a
> > > > bit
> > > > > > > > environment specific. If you have a central metrics system it
> > > > > shouldn't
> > > > > > > > make any difference, but if you don't I suppose it does.
> > > > > > > >
> > > > > > > > -Jay
> > > > > > > >
> > > > > > > > On Mon, Apr 6, 2015 at 7:57 PM, Gwen Shapira <
> > > > gshap...@cloudera.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Here's a wild guess:
> > > > > > > > >
> > > > > > > > > An app developer included a Kafka Producer in his app, and
> is
> > > not
> > > > > > > happy
> > > > > > > > > with the throughput. He doesn't have visibility into the
> > > brokers
> > > > > since
> > > > > > > > they
> > > > > > > > > are owned by a different team. Obviously the first instinct
> of
> > > a
> > > > > > > > developer
> > > > > > > > > who knows that throttling exists is to blame throttling for
> any
> > > > > > > slowdown
> > > > > > > > in
> > > > > > > > > the app.
> > > > > > > > > If he doesn't have a way to know from the responses whether
> or
> > > > not
> > > > > his
> > > > > > > > app
> > > > > > > > > is throttled, he may end up calling Aditya at 4am asked
> "Hey,
> > > is
> > > > my
> > > > > > > app
> > > > > > > > > throttled?".
> > > > > > > > >
> > > > > > > > > I assume Aditya is trying to avoid this scenario.
> > > > > > > > >
> > > > > > > > > On Mon, Apr 6, 2015 at 7:47 PM, Jay Kreps
> <jay.kr...@gmail.com
> > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Aditya,
> > > > > > > > > >
> > > > > > > > > > 2. I kind of buy it, but I really like to understand the
> > > > details
> > > > > of
> > > > > > > the
> > > > > > > > > use
> > > > > > > > > > case before we make protocol changes. What changes are
> you
> > > > > proposing
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > clients for monitoring and how would that be used?
> > > > > > > > > >
> > > > > > > > > > -Jay
> > > > > > > > > >
> > > > > > > > > > On Mon, Apr 6, 2015 at 10:36 AM, Aditya Auradkar <
> > > > > > > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Jay,
> > > > > > > > > > >
> > > > > > > > > > > 2. At this time, the proposed response format changes
> are
> > > > only
> > > > > for
> > > > > > > > > > > monitoring/informing clients. As Jun mentioned, we get
> > > > instance
> > > > > > > level
> > > > > > > > > > > monitoring in this case since each instance that got
> > > > throttled
> > > > > > > will
> > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > metric confirming the same. Without client level
> monitoring
> > > > for
> > > > > > > this,
> > > > > > > > > > it's
> > > > > > > > > > > hard for application developers to find if they are
> being
> > > > > > > throttled
> > > > > > > > > since
> > > > > > > > > > > they will also have to be aware of all the brokers in
> the
> > > > > cluster.
> > > > > > > > This
> > > > > > > > > > is
> > > > > > > > > > > quite problematic for large clusters.
> > > > > > > > > > >
> > > > > > > > > > > It seems nice for app developers to not have to think
> about
> > > > > kafka
> > > > > > > > > > internal
> > > > > > > > > > > metrics and only focus on the metrics exposed on their
> > > > > instances.
> > > > > > > > > > Analogous
> > > > > > > > > > > to having client-sde request latency metrics.
> Basically, we
> > > > > want
> > > > > > > an
> > > > > > > > > easy
> > > > > > > > > > > way for clients to be aware if they are being
> throttled.
> > > > > > > > > > >
> > > > > > > > > > > 4. For purgatory v delay queue, I think we are on the
> same
> > > > > page. I
> > > > > > > > feel
> > > > > > > > > > it
> > > > > > > > > > > is nicer to use the purgatory but I'm happy to use a
> > > > > DelayQueue if
> > > > > > > > > there
> > > > > > > > > > > are performance implications. I don't know enough about
> the
> > > > > > > current
> > > > > > > > and
> > > > > > > > > > > Yasuhiro's new implementation to be sure one way or the
> > > > other.
> > > > > > > > > > >
> > > > > > > > > > > Stepping back, I think these two things are the only
> > > > remaining
> > > > > > > point
> > > > > > > > of
> > > > > > > > > > > discussion within the current proposal. Any concerns if
> I
> > > > > started
> > > > > > > a
> > > > > > > > > > voting
> > > > > > > > > > > thread on the proposal after the KIP discussion
> tomorrow?
> > > > > > > (assuming
> > > > > > > > we
> > > > > > > > > > > reach consensus on these items)
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Aditya
> > > > > > > > > > > ________________________________________
> > > > > > > > > > > From: Jay Kreps [jay.kr...@gmail.com]
> > > > > > > > > > > Sent: Saturday, April 04, 2015 1:36 PM
> > > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > > > > >
> > > > > > > > > > > Hey Aditya,
> > > > > > > > > > >
> > > > > > > > > > > 2. For the return flag I'm not terribly particular. If
> we
> > > > want
> > > > > to
> > > > > > > add
> > > > > > > > > it
> > > > > > > > > > > let's fully think through how it will be used. The only
> > > > > concern I
> > > > > > > > have
> > > > > > > > > is
> > > > > > > > > > > adding to the protocol without really thinking through
> the
> > > > use
> > > > > > > cases.
> > > > > > > > > So
> > > > > > > > > > > let's work out the APIs we want to add to the Java
> consumer
> > > > and
> > > > > > > > > producer
> > > > > > > > > > > and the use cases for how clients will make use of
> these.
> > > For
> > > > > my
> > > > > > > > part I
> > > > > > > > > > > actually don't see much use other than monitoring since
> it
> > > > > isn't
> > > > > > > an
> > > > > > > > > error
> > > > > > > > > > > condition to be at your quota. And if it is just
> > > monitoring I
> > > > > > > don't
> > > > > > > > > see a
> > > > > > > > > > > big enough difference between having the monitoring on
> the
> > > > > > > > server-side
> > > > > > > > > > > versus in the clients to justify putting it in the
> > > protocol.
> > > > > But I
> > > > > > > > > think
> > > > > > > > > > > you guys may have other use cases in mind of how a
> client
> > > > would
> > > > > > > make
> > > > > > > > > some
> > > > > > > > > > > use of this? Let's work that out. I also don't feel
> > > strongly
> > > > > about
> > > > > > > > > it--it
> > > > > > > > > > > wouldn't be *bad* to have the monitoring available on
> the
> > > > > client,
> > > > > > > > just
> > > > > > > > > > > doesn't seem that much better.
> > > > > > > > > > >
> > > > > > > > > > > 4. For the purgatory vs delay queue I think is arguably
> > > nicer
> > > > > to
> > > > > > > > reuse
> > > > > > > > > > the
> > > > > > > > > > > purgatory we just have to be ultra-conscious of
> > > efficiency. I
> > > > > > > think
> > > > > > > > our
> > > > > > > > > > > goal is to turn quotas on across the board, so at
> LinkedIn
> > > > that
> > > > > > > would
> > > > > > > > > > mean
> > > > > > > > > > > potentially every request will need a small delay. I
> > > haven't
> > > > > > > worked
> > > > > > > > out
> > > > > > > > > > the
> > > > > > > > > > > efficiency implications of this choice, so as long as
> we do
> > > > > that
> > > > > > > I'm
> > > > > > > > > > happy.
> > > > > > > > > > >
> > > > > > > > > > > -Jay
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 3, 2015 at 1:10 PM, Aditya Auradkar <
> > > > > > > > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Some responses to Jay's points.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Using commas - Cool.
> > > > > > > > > > > >
> > > > > > > > > > > > 2. Adding return flag - I'm inclined to agree with
> Joel
> > > > that
> > > > > > > this
> > > > > > > > is
> > > > > > > > > > good
> > > > > > > > > > > > to have in the initial implementation.
> > > > > > > > > > > >
> > > > > > > > > > > > 3. Config - +1. I'll remove it from the KIP. We can
> > > discuss
> > > > > this
> > > > > > > in
> > > > > > > > > > > > parallel.
> > > > > > > > > > > >
> > > > > > > > > > > > 4. Purgatory vs Delay queue - I feel that it is
> simpler
> > > to
> > > > > reuse
> > > > > > > > the
> > > > > > > > > > > > existing purgatories for both delayed produce and
> fetch
> > > > > > > requests.
> > > > > > > > > IIUC,
> > > > > > > > > > > all
> > > > > > > > > > > > we need for quotas is a minWait parameter for
> > > > > DelayedOperation
> > > > > > > (or
> > > > > > > > > > > > something equivalent) since there is already a max
> wait.
> > > > The
> > > > > > > > > completion
> > > > > > > > > > > > criteria can check if minWait time has elapsed before
> > > > > declaring
> > > > > > > the
> > > > > > > > > > > > operation complete. For this to impact performance, a
> > > > > > > significant
> > > > > > > > > > number
> > > > > > > > > > > of
> > > > > > > > > > > > clients may need to exceed their quota at the same
> time
> > > and
> > > > > even
> > > > > > > > then
> > > > > > > > > > I'm
> > > > > > > > > > > > not very clear on the scope of the impact. Two layers
> of
> > > > > delays
> > > > > > > > might
> > > > > > > > > > add
> > > > > > > > > > > > complexity to the implementation which I'm hoping to
> > > avoid.
> > > > > > > > > > > >
> > > > > > > > > > > > Aditya
> > > > > > > > > > > >
> > > > > > > > > > > > ________________________________________
> > > > > > > > > > > > From: Joel Koshy [jjkosh...@gmail.com]
> > > > > > > > > > > > Sent: Friday, April 03, 2015 12:48 PM
> > > > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > > > > > >
> > > > > > > > > > > > Aditya, thanks for the updated KIP and Jay/Jun thanks
> for
> > > > the
> > > > > > > > > > > > comments. Couple of comments in-line:
> > > > > > > > > > > >
> > > > > > > > > > > > > 2. I would advocate for adding the return flag when
> we
> > > > next
> > > > > > > bump
> > > > > > > > > the
> > > > > > > > > > > > > request format version just to avoid proliferation.
> I
> > > > agree
> > > > > > > this
> > > > > > > > > is a
> > > > > > > > > > > > good
> > > > > > > > > > > > > thing to know about, but at the moment I don't
> think we
> > > > > have a
> > > > > > > > very
> > > > > > > > > > > well
> > > > > > > > > > > > > flushed out idea of how the client would actually
> make
> > > > use
> > > > > of
> > > > > > > > this
> > > > > > > > > > > info.
> > > > > > > > > > > > I
> > > > > > > > > > > >
> > > > > > > > > > > > I'm somewhat inclined to having something appropriate
> off
> > > > the
> > > > > > > bat -
> > > > > > > > > > > > mainly because (i) clients really should know that
> they
> > > > have
> > > > > > > been
> > > > > > > > > > > > throttled (ii) a smart producer/consumer
> implementation
> > > > would
> > > > > > > want
> > > > > > > > to
> > > > > > > > > > > > know how much to back off. So perhaps this and
> > > > > config-management
> > > > > > > > > > > > should be moved to a separate discussion, but it
> would be
> > > > > good
> > > > > > > to
> > > > > > > > > have
> > > > > > > > > > > > this discussion going and incorporated into the first
> > > quota
> > > > > > > > > > > > implementation.
> > > > > > > > > > > >
> > > > > > > > > > > > > 3. Config--I think we need to generalize the topic
> > > stuff
> > > > > so we
> > > > > > > > can
> > > > > > > > > > > > override
> > > > > > > > > > > > > at multiple levels. We have topic and client, but I
> > > > suspect
> > > > > > > > "user"
> > > > > > > > > > and
> > > > > > > > > > > > > "broker" will also be important. I recommend we
> take
> > > > config
> > > > > > > stuff
> > > > > > > > > out
> > > > > > > > > > > of
> > > > > > > > > > > > > this KIP since we really need to fully think
> through a
> > > > > > > proposal
> > > > > > > > > that
> > > > > > > > > > > will
> > > > > > > > > > > > > cover all these types of overrides.
> > > > > > > > > > > >
> > > > > > > > > > > > +1 - it is definitely orthogonal to the core quota
> > > > > > > implementation
> > > > > > > > > > > > (although necessary for its operability). Having a
> > > > > > > config-related
> > > > > > > > > > > > discussion in this KIP would only draw out the
> discussion
> > > > and
> > > > > > > vote
> > > > > > > > > > > > even if the core quota design looks good to everyone.
> > > > > > > > > > > >
> > > > > > > > > > > > So basically I think we can remove the portions on
> > > dynamic
> > > > > > > config
> > > > > > > > as
> > > > > > > > > > > > well as the response format but I really think we
> should
> > > > > close
> > > > > > > on
> > > > > > > > > > > > those while the implementation is in progress and
> before
> > > > > quotas
> > > > > > > is
> > > > > > > > > > > > officially released.
> > > > > > > > > > > >
> > > > > > > > > > > > > 4. Instead of using purgatories to implement the
> delay
> > > > > would
> > > > > > > it
> > > > > > > > > make
> > > > > > > > > > > more
> > > > > > > > > > > > > sense to just use a delay queue? I think all the
> > > > additional
> > > > > > > stuff
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > purgatory other than the delay queue doesn't make
> sense
> > > > as
> > > > > the
> > > > > > > > > quota
> > > > > > > > > > > is a
> > > > > > > > > > > > > hard N ms penalty with no chance of early eviction.
> If
> > > > > there
> > > > > > > is
> > > > > > > > no
> > > > > > > > > > perf
> > > > > > > > > > > > > penalty for the full purgatory that may be fine
> (even
> > > > > good) to
> > > > > > > > > reuse,
> > > > > > > > > > > > but I
> > > > > > > > > > > > > haven't looked into that.
> > > > > > > > > > > >
> > > > > > > > > > > > A simple delay queue sounds good - I think Aditya was
> > > also
> > > > > > > trying
> > > > > > > > to
> > > > > > > > > > > > avoid adding a new quota purgatory. i.e., it may be
> > > > possible
> > > > > to
> > > > > > > use
> > > > > > > > > > > > the existing purgatory instances to enforce quotas.
> That
> > > > may
> > > > > be
> > > > > > > > > > > > simpler, but would be incur a slight perf penalty if
> too
> > > > many
> > > > > > > > clients
> > > > > > > > > > > > are being throttled.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Joel
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Jay
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 3, 2015 at 10:45 AM, Aditya Auradkar <
> > > > > > > > > > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Update, I added a proposal on doing dynamic client
> > > based
> > > > > > > > > > configuration
> > > > > > > > > > > > >> that can be used for quotas.
> > > > > > > > > > > > >>
> > > > > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Please take a look and let me know if there are
> any
> > > > > concerns.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > >> Aditya
> > > > > > > > > > > > >> ________________________________________
> > > > > > > > > > > > >> From: Aditya Auradkar
> > > > > > > > > > > > >> Sent: Friday, April 03, 2015 10:10 AM
> > > > > > > > > > > > >> To: dev@kafka.apache.org
> > > > > > > > > > > > >> Subject: RE: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks Jun.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Some thoughts:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> 10) I think it is better we throttle regardless of
> the
> > > > > > > > > produce/fetch
> > > > > > > > > > > > >> version. This is a nice feature where clients can
> tell
> > > > if
> > > > > > > they
> > > > > > > > are
> > > > > > > > > > > being
> > > > > > > > > > > > >> throttled or not. If we only throttle newer
> clients,
> > > > then
> > > > > we
> > > > > > > > have
> > > > > > > > > > > > >> inconsistent behavior across clients in a
> multi-tenant
> > > > > > > cluster.
> > > > > > > > > > Having
> > > > > > > > > > > > >> quota metrics on the client side is also a nice
> > > > incentive
> > > > > to
> > > > > > > > > upgrade
> > > > > > > > > > > > client
> > > > > > > > > > > > >> versions.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> 11) I think we can call metric.record(fetchSize)
> > > before
> > > > > > > adding
> > > > > > > > the
> > > > > > > > > > > > >> delayedFetch request into the purgatory. This will
> > > give
> > > > us
> > > > > > > the
> > > > > > > > > > > estimated
> > > > > > > > > > > > >> delay of the request up-front. The timeout on the
> > > > > > > DelayedFetch
> > > > > > > > is
> > > > > > > > > > the
> > > > > > > > > > > > >> Max(maxWait, quotaDelay). The DelayedFetch
> completion
> > > > > > > criteria
> > > > > > > > can
> > > > > > > > > > > > change a
> > > > > > > > > > > > >> little to accomodate quotas.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> - I agree the quota code should return the
> estimated
> > > > delay
> > > > > > > time
> > > > > > > > in
> > > > > > > > > > > > >> QuotaViolationException.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > >> Aditya
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> ________________________________________
> > > > > > > > > > > > >> From: Jun Rao [j...@confluent.io]
> > > > > > > > > > > > >> Sent: Friday, April 03, 2015 9:16 AM
> > > > > > > > > > > > >> To: dev@kafka.apache.org
> > > > > > > > > > > > >> Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks for the update.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> 10. About whether to return a new field in the
> > > response
> > > > to
> > > > > > > > > indicate
> > > > > > > > > > > > >> throttling. Earlier, the plan was to not change
> the
> > > > > response
> > > > > > > > > format
> > > > > > > > > > > and
> > > > > > > > > > > > >> just have a metric on the broker to indicate
> whether a
> > > > > > > clientId
> > > > > > > > is
> > > > > > > > > > > > >> throttled or not. The issue is that we don't know
> > > > whether
> > > > > a
> > > > > > > > > > particular
> > > > > > > > > > > > >> clientId instance is throttled or not (since there
> > > could
> > > > > be
> > > > > > > > > multiple
> > > > > > > > > > > > >> clients with the same clientId). Your proposal of
> > > adding
> > > > > an
> > > > > > > > > > > isThrottled
> > > > > > > > > > > > >> field in the response addresses and seems better.
> > > Then,
> > > > > do we
> > > > > > > > just
> > > > > > > > > > > > throttle
> > > > > > > > > > > > >> the new version of produce/fetch request or both
> the
> > > old
> > > > > and
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > > >> versions? Also, we probably still need a separate
> > > metric
> > > > > on
> > > > > > > the
> > > > > > > > > > broker
> > > > > > > > > > > > side
> > > > > > > > > > > > >> to indicate whether a clientId is throttled or
> not.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> 11. Just to clarify. For fetch requests, when will
> > > > > > > > > > > > metric.record(fetchSize)
> > > > > > > > > > > > >> be called? Is it when we are ready to send the
> fetch
> > > > > response
> > > > > > > > > (after
> > > > > > > > > > > > >> minBytes and maxWait are satisfied)?
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> As an implementation detail, it may be useful for
> the
> > > > > quota
> > > > > > > code
> > > > > > > > > to
> > > > > > > > > > > > return
> > > > > > > > > > > > >> an estimated delay time (to bring the measurement
> > > within
> > > > > the
> > > > > > > > > limit)
> > > > > > > > > > in
> > > > > > > > > > > > >> QuotaViolationException.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Jun
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Wed, Apr 1, 2015 at 3:27 PM, Aditya Auradkar <
> > > > > > > > > > > > >> aaurad...@linkedin.com.invalid> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Hey everyone,
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > I've made changes to the KIP to capture our
> > > > discussions
> > > > > > > over
> > > > > > > > the
> > > > > > > > > > > last
> > > > > > > > > > > > >> > couple of weeks.
> > > > > > > > > > > > >> >
> > > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > I'll start a voting thread after people have had
> a
> > > > > chance
> > > > > > > to
> > > > > > > > > > > > >> read/comment.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > >> > Aditya
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > ________________________________________
> > > > > > > > > > > > >> > From: Steven Wu [stevenz...@gmail.com]
> > > > > > > > > > > > >> > Sent: Friday, March 20, 2015 9:14 AM
> > > > > > > > > > > > >> > To: dev@kafka.apache.org
> > > > > > > > > > > > >> > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > +1 on Jun's suggestion of maintaining one
> set/style
> > > of
> > > > > > > metrics
> > > > > > > > > at
> > > > > > > > > > > > broker.
> > > > > > > > > > > > >> > In Netflix, we have to convert the yammer
> metrics to
> > > > > servo
> > > > > > > > > metrics
> > > > > > > > > > > at
> > > > > > > > > > > > >> > broker. it will be painful to know some metrics
> are
> > > > in a
> > > > > > > > > different
> > > > > > > > > > > > style
> > > > > > > > > > > > >> > and get to be handled differently.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > On Fri, Mar 20, 2015 at 8:17 AM, Jun Rao <
> > > > > j...@confluent.io>
> > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > Not so sure. People who use quota will
> definitely
> > > > > want to
> > > > > > > > > > monitor
> > > > > > > > > > > > the
> > > > > > > > > > > > >> new
> > > > > > > > > > > > >> > > metrics at the client id level. Then they will
> > > need
> > > > to
> > > > > > > deal
> > > > > > > > > with
> > > > > > > > > > > > those
> > > > > > > > > > > > >> > > metrics differently from the rest of the
> metrics.
> > > It
> > > > > > > would
> > > > > > > > be
> > > > > > > > > > > > better if
> > > > > > > > > > > > >> > we
> > > > > > > > > > > > >> > > can hide this complexity from the users.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Thanks,
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Jun
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > On Thu, Mar 19, 2015 at 10:45 PM, Joel Koshy <
> > > > > > > > > > jjkosh...@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > Actually thinking again - since these will
> be a
> > > > few
> > > > > new
> > > > > > > > > > metrics
> > > > > > > > > > > at
> > > > > > > > > > > > >> the
> > > > > > > > > > > > >> > > > client id level (bytes in and bytes out to
> start
> > > > > with)
> > > > > > > > maybe
> > > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > >> fine
> > > > > > > > > > > > >> > > to
> > > > > > > > > > > > >> > > > have the two type of metrics coexist and we
> can
> > > > > migrate
> > > > > > > > the
> > > > > > > > > > > > existing
> > > > > > > > > > > > >> > > > metrics in parallel.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > On Thursday, March 19, 2015, Joel Koshy <
> > > > > > > > > jjkosh...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > That is a valid concern but in that case I
> > > think
> > > > > it
> > > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > > better
> > > > > > > > > > > > >> > to
> > > > > > > > > > > > >> > > > > just migrate completely to the new metrics
> > > > package
> > > > > > > > first.
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > On Thursday, March 19, 2015, Jun Rao <
> > > > > > > j...@confluent.io
> > > > > > > > > > > > >> > > > > <javascript:_e(%7B%7D,'cvml','
> > > j...@confluent.io
> > > > > ');>>
> > > > > > > > > wrote:
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > >> Hmm, I was thinking a bit differently on
> the
> > > > > metrics
> > > > > > > > > > stuff. I
> > > > > > > > > > > > >> think
> > > > > > > > > > > > >> > it
> > > > > > > > > > > > >> > > > >> would be confusing to have some metrics
> > > defined
> > > > > in
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > > metrics
> > > > > > > > > > > > >> > > > package
> > > > > > > > > > > > >> > > > >> while some others defined in Coda Hale.
> Those
> > > > > > > metrics
> > > > > > > > > will
> > > > > > > > > > > look
> > > > > > > > > > > > >> > > > different
> > > > > > > > > > > > >> > > > >> (e.g., rates in Coda Hale will have
> special
> > > > > > > attributes
> > > > > > > > > such
> > > > > > > > > > > as
> > > > > > > > > > > > >> > > > >> 1-min-average). People may need different
> > > ways
> > > > to
> > > > > > > > export
> > > > > > > > > > the
> > > > > > > > > > > > >> metrics
> > > > > > > > > > > > >> > > to
> > > > > > > > > > > > >> > > > >> external systems such as Graphite. So,
> > > instead
> > > > of
> > > > > > > using
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > > > >> > > metrics
> > > > > > > > > > > > >> > > > >> package on the broker, I was thinking
> that we
> > > > can
> > > > > > > just
> > > > > > > > > > > > implement a
> > > > > > > > > > > > >> > > > >> QuotaMetrics that wraps the Coda Hale
> > > metrics.
> > > > > The
> > > > > > > > > > > > implementation
> > > > > > > > > > > > >> > can
> > > > > > > > > > > > >> > > be
> > > > > > > > > > > > >> > > > >> the same as what's in the new metrics
> > > package.
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> Thanks,
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> Jun
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> On Thu, Mar 19, 2015 at 8:09 PM, Jay
> Kreps <
> > > > > > > > > > > > jay.kr...@gmail.com>
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> > Yeah I was saying was that we are
> blocked
> > > on
> > > > > > > picking
> > > > > > > > an
> > > > > > > > > > > > approach
> > > > > > > > > > > > >> > for
> > > > > > > > > > > > >> > > > >> > metrics but not necessarily the full
> > > > > conversion.
> > > > > > > > > Clearly
> > > > > > > > > > if
> > > > > > > > > > > > we
> > > > > > > > > > > > >> > pick
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > >> new
> > > > > > > > > > > > >> > > > >> > metrics package we would need to
> implement
> > > > the
> > > > > two
> > > > > > > > > > metrics
> > > > > > > > > > > we
> > > > > > > > > > > > >> want
> > > > > > > > > > > > >> > > to
> > > > > > > > > > > > >> > > > >> quota
> > > > > > > > > > > > >> > > > >> > on. But the conversion of the remaining
> > > > metrics
> > > > > > > can
> > > > > > > > be
> > > > > > > > > > done
> > > > > > > > > > > > >> > > > >> asynchronously.
> > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > >> > > > >> > -Jay
> > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > >> > > > >> > On Thu, Mar 19, 2015 at 5:56 PM, Joel
> > > Koshy <
> > > > > > > > > > > > >> jjkosh...@gmail.com>
> > > > > > > > > > > > >> > > > >> wrote:
> > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > >> > > > >> > > > in KAFKA-1930). I agree that this
> KIP
> > > > > doesn't
> > > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > > block
> > > > > > > > > > > > >> on
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > >> > > > migration of the metrics package.
> > > > > > > > > > > > >> > > > >> > >
> > > > > > > > > > > > >> > > > >> > > Can you clarify the above? i.e., if
> we
> > > are
> > > > > going
> > > > > > > to
> > > > > > > > > > quota
> > > > > > > > > > > > on
> > > > > > > > > > > > >> > > > something
> > > > > > > > > > > > >> > > > >> > > then we would want to have migrated
> that
> > > > > metric
> > > > > > > > over
> > > > > > > > > > > > right? Or
> > > > > > > > > > > > >> > do
> > > > > > > > > > > > >> > > > you
> > > > > > > > > > > > >> > > > >> > > mean we don't need to complete the
> > > > migration
> > > > > of
> > > > > > > all
> > > > > > > > > > > > metrics to
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > >> > > metrics package right?
> > > > > > > > > > > > >> > > > >> > >
> > > > > > > > > > > > >> > > > >> > > I think most of us now feel that the
> > > delay
> > > > +
> > > > > no
> > > > > > > > error
> > > > > > > > > > is
> > > > > > > > > > > a
> > > > > > > > > > > > >> good
> > > > > > > > > > > > >> > > > >> > > approach, but it would be good to
> make
> > > sure
> > > > > > > > everyone
> > > > > > > > > is
> > > > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > > same
> > > > > > > > > > > > >> > > > >> > > page.
> > > > > > > > > > > > >> > > > >> > >
> > > > > > > > > > > > >> > > > >> > > As Aditya requested a couple of days
> ago
> > > I
> > > > > think
> > > > > > > we
> > > > > > > > > > > should
> > > > > > > > > > > > go
> > > > > > > > > > > > >> > over
> > > > > > > > > > > > >> > > > >> > > this at the next KIP hangout.
> > > > > > > > > > > > >> > > > >> > >
> > > > > > > > > > > > >> > > > >> > > Joel
> > > > > > > > > > > > >> > > > >> > >
> > > > > > > > > > > > >> > > > >> > > On Thu, Mar 19, 2015 at 09:24:09AM
> -0700,
> > > > Jun
> > > > > > > Rao
> > > > > > > > > > wrote:
> > > > > > > > > > > > >> > > > >> > > > 1. Delay + no error seems
> reasonable to
> > > > me.
> > > > > > > > > However,
> > > > > > > > > > I
> > > > > > > > > > > do
> > > > > > > > > > > > >> feel
> > > > > > > > > > > > >> > > > that
> > > > > > > > > > > > >> > > > >> we
> > > > > > > > > > > > >> > > > >> > > need
> > > > > > > > > > > > >> > > > >> > > > to give the client an indicator
> that
> > > it's
> > > > > > > being
> > > > > > > > > > > > throttled,
> > > > > > > > > > > > >> > > instead
> > > > > > > > > > > > >> > > > >> of
> > > > > > > > > > > > >> > > > >> > > doing
> > > > > > > > > > > > >> > > > >> > > > this silently. For that, we
> probably
> > > need
> > > > > to
> > > > > > > > evolve
> > > > > > > > > > the
> > > > > > > > > > > > >> > > > >> produce/fetch
> > > > > > > > > > > > >> > > > >> > > > protocol to include an extra status
> > > field
> > > > > in
> > > > > > > the
> > > > > > > > > > > > response.
> > > > > > > > > > > > >> We
> > > > > > > > > > > > >> > > > >> probably
> > > > > > > > > > > > >> > > > >> > > need
> > > > > > > > > > > > >> > > > >> > > > to think more about whether we just
> > > want
> > > > to
> > > > > > > > return
> > > > > > > > > a
> > > > > > > > > > > > simple
> > > > > > > > > > > > >> > > status
> > > > > > > > > > > > >> > > > >> code
> > > > > > > > > > > > >> > > > >> > > > (e.g., 1 = throttled) or a value
> that
> > > > > > > indicates
> > > > > > > > how
> > > > > > > > > > > much
> > > > > > > > > > > > is
> > > > > > > > > > > > >> > > being
> > > > > > > > > > > > >> > > > >> > > throttled.
> > > > > > > > > > > > >> > > > >> > > >
> > > > > > > > > > > > >> > > > >> > > > 2. We probably need to improve the
> > > > > histogram
> > > > > > > > > support
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > new
> > > > > > > > > > > > >> > > > >> metrics
> > > > > > > > > > > > >> > > > >> > > > package before we can use it more
> > > widely
> > > > on
> > > > > > > the
> > > > > > > > > > server
> > > > > > > > > > > > side
> > > > > > > > > > > > >> > > (left
> > > > > > > > > > > > >> > > > a
> > > > > > > > > > > > >> > > > >> > > comment
> > > > > > > > > > > > >> > > > >> > > > in KAFKA-1930). I agree that this
> KIP
> > > > > doesn't
> > > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > > block
> > > > > > > > > > > > >> on
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > >> > > > migration of the metrics package.
> > > > > > > > > > > > >> > > > >> > > >
> > > > > > > > > > > > >> > > > >> > > > Thanks,
> > > > > > > > > > > > >> > > > >> > > >
> > > > > > > > > > > > >> > > > >> > > > Jun
> > > > > > > > > > > > >> > > > >> > > >
> > > > > > > > > > > > >> > > > >> > > > On Wed, Mar 18, 2015 at 4:02 PM,
> Aditya
> > > > > > > Auradkar
> > > > > > > > <
> > > > > > > > > > > > >> > > > >> > > > aaurad...@linkedin.com.invalid>
> wrote:
> > > > > > > > > > > > >> > > > >> > > >
> > > > > > > > > > > > >> > > > >> > > > > Hey everyone,
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > Thanks for the great discussion.
> > > There
> > > > > are
> > > > > > > > > > currently
> > > > > > > > > > > a
> > > > > > > > > > > > few
> > > > > > > > > > > > >> > > > points
> > > > > > > > > > > > >> > > > >> on
> > > > > > > > > > > > >> > > > >> > > this
> > > > > > > > > > > > >> > > > >> > > > > KIP that need addressing and I
> want
> > > to
> > > > > make
> > > > > > > > sure
> > > > > > > > > we
> > > > > > > > > > > > are on
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > >> same
> > > > > > > > > > > > >> > > > >> > > page
> > > > > > > > > > > > >> > > > >> > > > > about those.
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > 1. Append and delay response vs
> delay
> > > > and
> > > > > > > > return
> > > > > > > > > > > error
> > > > > > > > > > > > >> > > > >> > > > > - I think we've discussed the
> pros
> > > and
> > > > > cons
> > > > > > > of
> > > > > > > > > each
> > > > > > > > > > > > >> approach
> > > > > > > > > > > > >> > > but
> > > > > > > > > > > > >> > > > >> > > haven't
> > > > > > > > > > > > >> > > > >> > > > > chosen an approach yet. Where
> does
> > > > > everyone
> > > > > > > > stand
> > > > > > > > > > on
> > > > > > > > > > > > this
> > > > > > > > > > > > >> > > issue?
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > 2. Metrics Migration and usage in
> > > > quotas
> > > > > > > > > > > > >> > > > >> > > > > - The metrics library in clients
> has
> > > a
> > > > > > > notion
> > > > > > > > of
> > > > > > > > > > > quotas
> > > > > > > > > > > > >> that
> > > > > > > > > > > > >> > > we
> > > > > > > > > > > > >> > > > >> > should
> > > > > > > > > > > > >> > > > >> > > > > reuse. For that to happen, we
> need to
> > > > > > > migrate
> > > > > > > > the
> > > > > > > > > > > > server
> > > > > > > > > > > > >> to
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > >> new
> > > > > > > > > > > > >> > > > >> > > metrics
> > > > > > > > > > > > >> > > > >> > > > > package.
> > > > > > > > > > > > >> > > > >> > > > > - Need more clarification on how
> to
> > > > > compute
> > > > > > > > > > > throttling
> > > > > > > > > > > > >> time
> > > > > > > > > > > > >> > > and
> > > > > > > > > > > > >> > > > >> > > windowing
> > > > > > > > > > > > >> > > > >> > > > > for quotas.
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > I'm going to start a new KIP to
> > > discuss
> > > > > > > metrics
> > > > > > > > > > > > migration
> > > > > > > > > > > > >> > > > >> separately.
> > > > > > > > > > > > >> > > > >> > > That
> > > > > > > > > > > > >> > > > >> > > > > will also contain a section on
> > > quotas.
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > 3. Dynamic Configuration
> management -
> > > > > Being
> > > > > > > > > > discussed
> > > > > > > > > > > > in
> > > > > > > > > > > > >> > > KIP-5.
> > > > > > > > > > > > >> > > > >> > > Basically
> > > > > > > > > > > > >> > > > >> > > > > we need something that will model
> > > > default
> > > > > > > > quotas
> > > > > > > > > > and
> > > > > > > > > > > > allow
> > > > > > > > > > > > >> > > > >> per-client
> > > > > > > > > > > > >> > > > >> > > > > overrides.
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > Is there something else that I'm
> > > > missing?
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > Thanks,
> > > > > > > > > > > > >> > > > >> > > > > Aditya
> > > > > > > > > > > > >> > > > >> > > > >
> > > > ________________________________________
> > > > > > > > > > > > >> > > > >> > > > > From: Jay Kreps
> [jay.kr...@gmail.com
> > > ]
> > > > > > > > > > > > >> > > > >> > > > > Sent: Wednesday, March 18, 2015
> 2:10
> > > PM
> > > > > > > > > > > > >> > > > >> > > > > To: dev@kafka.apache.org
> > > > > > > > > > > > >> > > > >> > > > > Subject: Re: [KIP-DISCUSSION]
> KIP-13
> > > > > Quotas
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > Hey Steven,
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > The current proposal is actually
> to
> > > > > enforce
> > > > > > > > > quotas
> > > > > > > > > > at
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > > > >> > > > > client/application level, NOT the
> > > topic
> > > > > > > level.
> > > > > > > > So
> > > > > > > > > > if
> > > > > > > > > > > > you
> > > > > > > > > > > > >> > have
> > > > > > > > > > > > >> > > a
> > > > > > > > > > > > >> > > > >> > service
> > > > > > > > > > > > >> > > > >> > > > > with a few dozen instances the
> quota
> > > is
> > > > > > > against
> > > > > > > > > all
> > > > > > > > > > > of
> > > > > > > > > > > > >> those
> > > > > > > > > > > > >> > > > >> > instances
> > > > > > > > > > > > >> > > > >> > > > > added up across all their topics.
> So
> > > > > > > actually
> > > > > > > > the
> > > > > > > > > > > > effect
> > > > > > > > > > > > >> > would
> > > > > > > > > > > > >> > > > be
> > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > >> > > > >> > > same
> > > > > > > > > > > > >> > > > >> > > > > either way but throttling gives
> the
> > > > > producer
> > > > > > > > the
> > > > > > > > > > > > choice of
> > > > > > > > > > > > >> > > > either
> > > > > > > > > > > > >> > > > >> > > blocking
> > > > > > > > > > > > >> > > > >> > > > > or dropping.
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > -Jay
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > On Tue, Mar 17, 2015 at 10:08 AM,
> > > > Steven
> > > > > Wu
> > > > > > > <
> > > > > > > > > > > > >> > > > stevenz...@gmail.com
> > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > >> > > > >> > > wrote:
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > > > > > Jay,
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > let's say an app produces to 10
> > > > > different
> > > > > > > > > topics.
> > > > > > > > > > > > one of
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > >> topic
> > > > > > > > > > > > >> > > > >> > is
> > > > > > > > > > > > >> > > > >> > > > > sent
> > > > > > > > > > > > >> > > > >> > > > > > from a library. due to whatever
> > > > > > > > condition/bug,
> > > > > > > > > > this
> > > > > > > > > > > > lib
> > > > > > > > > > > > >> > > starts
> > > > > > > > > > > > >> > > > >> to
> > > > > > > > > > > > >> > > > >> > > send
> > > > > > > > > > > > >> > > > >> > > > > > messages over the quota. if we
> go
> > > > with
> > > > > the
> > > > > > > > > > delayed
> > > > > > > > > > > > >> > response
> > > > > > > > > > > > >> > > > >> > > approach, it
> > > > > > > > > > > > >> > > > >> > > > > > will cause the whole shared
> > > > > > > RecordAccumulator
> > > > > > > > > > > buffer
> > > > > > > > > > > > to
> > > > > > > > > > > > >> be
> > > > > > > > > > > > >> > > > >> filled
> > > > > > > > > > > > >> > > > >> > up.
> > > > > > > > > > > > >> > > > >> > > > > that
> > > > > > > > > > > > >> > > > >> > > > > > will penalize other 9 topics
> who
> > > are
> > > > > > > within
> > > > > > > > the
> > > > > > > > > > > > quota.
> > > > > > > > > > > > >> > that
> > > > > > > > > > > > >> > > is
> > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > >> > > > >> > > > > > unfairness point that Ewen and
> I
> > > were
> > > > > > > trying
> > > > > > > > to
> > > > > > > > > > > make.
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > if broker just drop the msg and
> > > > return
> > > > > an
> > > > > > > > > > > > error/status
> > > > > > > > > > > > >> > code
> > > > > > > > > > > > >> > > > >> > > indicates the
> > > > > > > > > > > > >> > > > >> > > > > > drop and why. then producer can
> > > just
> > > > > move
> > > > > > > on
> > > > > > > > > and
> > > > > > > > > > > > accept
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > >> drop.
> > > > > > > > > > > > >> > > > >> > > shared
> > > > > > > > > > > > >> > > > >> > > > > > buffer won't be saturated and
> > > other 9
> > > > > > > topics
> > > > > > > > > > won't
> > > > > > > > > > > be
> > > > > > > > > > > > >> > > > penalized.
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > Thanks,
> > > > > > > > > > > > >> > > > >> > > > > > Steven
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > On Tue, Mar 17, 2015 at 9:44
> AM,
> > > Jay
> > > > > Kreps
> > > > > > > <
> > > > > > > > > > > > >> > > > jay.kr...@gmail.com
> > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > >> > > > >> > > wrote:
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > Hey Steven,
> > > > > > > > > > > > >> > > > >> > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > It is true that hitting the
> quota
> > > > > will
> > > > > > > > cause
> > > > > > > > > > > > >> > back-pressure
> > > > > > > > > > > > >> > > > on
> > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > >> > > > >> > > > > > producer.
> > > > > > > > > > > > >> > > > >> > > > > > > But the solution is simple, a
> > > > > producer
> > > > > > > that
> > > > > > > > > > wants
> > > > > > > > > > > > to
> > > > > > > > > > > > >> > avoid
> > > > > > > > > > > > >> > > > >> this
> > > > > > > > > > > > >> > > > >> > > should
> > > > > > > > > > > > >> > > > >> > > > > > stay
> > > > > > > > > > > > >> > > > >> > > > > > > under its quota. In other
> words
> > > > this
> > > > > is
> > > > > > > a
> > > > > > > > > > > contract
> > > > > > > > > > > > >> > between
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > >> > > cluster
> > > > > > > > > > > > >> > > > >> > > > > > and
> > > > > > > > > > > > >> > > > >> > > > > > > the client, with each side
> having
> > > > > > > something
> > > > > > > > > to
> > > > > > > > > > > > uphold.
> > > > > > > > > > > > >> > > Quite
> > > > > > > > > > > > >> > > > >> > > possibly
> > > > > > > > > > > > >> > > > >> > > > > the
> > > > > > > > > > > > >> > > > >> > > > > > > same thing will happen in the
> > > > > absence of
> > > > > > > a
> > > > > > > > > > > quota, a
> > > > > > > > > > > > >> > client
> > > > > > > > > > > > >> > > > >> that
> > > > > > > > > > > > >> > > > >> > > > > produces
> > > > > > > > > > > > >> > > > >> > > > > > an
> > > > > > > > > > > > >> > > > >> > > > > > > unexpected amount of load
> will
> > > hit
> > > > > the
> > > > > > > > limits
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > > server
> > > > > > > > > > > > >> > > > >> and
> > > > > > > > > > > > >> > > > >> > > > > > experience
> > > > > > > > > > > > >> > > > >> > > > > > > backpressure. Quotas just
> allow
> > > you
> > > > > to
> > > > > > > set
> > > > > > > > > that
> > > > > > > > > > > > same
> > > > > > > > > > > > >> > limit
> > > > > > > > > > > > >> > > > at
> > > > > > > > > > > > >> > > > >> > > something
> > > > > > > > > > > > >> > > > >> > > > > > > lower than 100% of all
> resources
> > > on
> > > > > the
> > > > > > > > > server,
> > > > > > > > > > > > which
> > > > > > > > > > > > >> is
> > > > > > > > > > > > >> > > > >> useful
> > > > > > > > > > > > >> > > > >> > > for a
> > > > > > > > > > > > >> > > > >> > > > > > > shared cluster.
> > > > > > > > > > > > >> > > > >> > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > -Jay
> > > > > > > > > > > > >> > > > >> > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > On Mon, Mar 16, 2015 at 11:34
> PM,
> > > > > Steven
> > > > > > > > Wu <
> > > > > > > > > > > > >> > > > >> > stevenz...@gmail.com>
> > > > > > > > > > > > >> > > > >> > > > > > wrote:
> > > > > > > > > > > > >> > > > >> > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > wait. we create one kafka
> > > > producer
> > > > > for
> > > > > > > > each
> > > > > > > > > > > > cluster.
> > > > > > > > > > > > >> > > each
> > > > > > > > > > > > >> > > > >> > > cluster can
> > > > > > > > > > > > >> > > > >> > > > > > > have
> > > > > > > > > > > > >> > > > >> > > > > > > > many topics. if producer
> buffer
> > > > got
> > > > > > > > filled
> > > > > > > > > up
> > > > > > > > > > > > due to
> > > > > > > > > > > > >> > > > delayed
> > > > > > > > > > > > >> > > > >> > > response
> > > > > > > > > > > > >> > > > >> > > > > > for
> > > > > > > > > > > > >> > > > >> > > > > > > > one throttled topic, won't
> that
> > > > > > > penalize
> > > > > > > > > > other
> > > > > > > > > > > > >> topics
> > > > > > > > > > > > >> > > > >> unfairly?
> > > > > > > > > > > > >> > > > >> > > it
> > > > > > > > > > > > >> > > > >> > > > > > seems
> > > > > > > > > > > > >> > > > >> > > > > > > to
> > > > > > > > > > > > >> > > > >> > > > > > > > me that broker should just
> > > return
> > > > > > > error
> > > > > > > > > > without
> > > > > > > > > > > > >> delay.
> > > > > > > > > > > > >> > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > sorry that I am chatting to
> > > > myself
> > > > > :)
> > > > > > > > > > > > >> > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > On Mon, Mar 16, 2015 at
> 11:29
> > > PM,
> > > > > > > Steven
> > > > > > > > > Wu <
> > > > > > > > > > > > >> > > > >> > > stevenz...@gmail.com>
> > > > > > > > > > > > >> > > > >> > > > > > > wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > > I think I can answer my
> own
> > > > > > > question.
> > > > > > > > > > delayed
> > > > > > > > > > > > >> > response
> > > > > > > > > > > > >> > > > >> will
> > > > > > > > > > > > >> > > > >> > > cause
> > > > > > > > > > > > >> > > > >> > > > > the
> > > > > > > > > > > > >> > > > >> > > > > > > > > producer buffer to be
> full,
> > > > which
> > > > > > > then
> > > > > > > > > > result
> > > > > > > > > > > > in
> > > > > > > > > > > > >> > > either
> > > > > > > > > > > > >> > > > >> > thread
> > > > > > > > > > > > >> > > > >> > > > > > blocking
> > > > > > > > > > > > >> > > > >> > > > > > > > or
> > > > > > > > > > > > >> > > > >> > > > > > > > > message drop.
> > > > > > > > > > > > >> > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > > On Mon, Mar 16, 2015 at
> 11:24
> > > > PM,
> > > > > > > > Steven
> > > > > > > > > > Wu <
> > > > > > > > > > > > >> > > > >> > > stevenz...@gmail.com>
> > > > > > > > > > > > >> > > > >> > > > > > > > wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >> please correct me if I
> am
> > > > > missing
> > > > > > > sth
> > > > > > > > > > here.
> > > > > > > > > > > I
> > > > > > > > > > > > am
> > > > > > > > > > > > >> > not
> > > > > > > > > > > > >> > > > >> > > understanding
> > > > > > > > > > > > >> > > > >> > > > > > how
> > > > > > > > > > > > >> > > > >> > > > > > > > >> would throttle work
> without
> > > > > > > > > > > > cooperation/back-off
> > > > > > > > > > > > >> > from
> > > > > > > > > > > > >> > > > >> > > producer.
> > > > > > > > > > > > >> > > > >> > > > > new
> > > > > > > > > > > > >> > > > >> > > > > > > Java
> > > > > > > > > > > > >> > > > >> > > > > > > > >> producer supports
> > > non-blocking
> > > > > API.
> > > > > > > > why
> > > > > > > > > > > would
> > > > > > > > > > > > >> > delayed
> > > > > > > > > > > > >> > > > >> > > response be
> > > > > > > > > > > > >> > > > >> > > > > > able
> > > > > > > > > > > > >> > > > >> > > > > > > > to
> > > > > > > > > > > > >> > > > >> > > > > > > > >> slow down producer?
> producer
> > > > > will
> > > > > > > > > continue
> > > > > > > > > > > to
> > > > > > > > > > > > >> fire
> > > > > > > > > > > > >> > > > async
> > > > > > > > > > > > >> > > > >> > > sends.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >> On Mon, Mar 16, 2015 at
> > > 10:58
> > > > > PM,
> > > > > > > > > Guozhang
> > > > > > > > > > > > Wang <
> > > > > > > > > > > > >> > > > >> > > > > wangg...@gmail.com
> > > > > > > > > > > > >> > > > >> > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >> wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> I think we are really
> > > > > discussing
> > > > > > > two
> > > > > > > > > > > separate
> > > > > > > > > > > > >> > issues
> > > > > > > > > > > > >> > > > >> here:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> 1. Whether we should a)
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > append-then-block-then-returnOKButThrottled
> > > > > > > > > > > > >> > > > >> > > > > > > or
> > > > > > > > > > > > >> > > > >> > > > > > > > b)
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > block-then-returnFailDuetoThrottled
> > > > > > > > for
> > > > > > > > > > > quota
> > > > > > > > > > > > >> > > actions
> > > > > > > > > > > > >> > > > on
> > > > > > > > > > > > >> > > > >> > > produce
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> requests.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> Both these approaches
> > > assume
> > > > > some
> > > > > > > > kind
> > > > > > > > > of
> > > > > > > > > > > > >> > > > >> well-behaveness
> > > > > > > > > > > > >> > > > >> > of
> > > > > > > > > > > > >> > > > >> > > the
> > > > > > > > > > > > >> > > > >> > > > > > > > clients:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> option a) assumes the
> > > client
> > > > > sets
> > > > > > > an
> > > > > > > > > > proper
> > > > > > > > > > > > >> > timeout
> > > > > > > > > > > > >> > > > >> value
> > > > > > > > > > > > >> > > > >> > > while
> > > > > > > > > > > > >> > > > >> > > > > can
> > > > > > > > > > > > >> > > > >> > > > > > > > just
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> ignore "OKButThrottled"
> > > > > response,
> > > > > > > > while
> > > > > > > > > > > > option
> > > > > > > > > > > > >> b)
> > > > > > > > > > > > >> > > > >> assumes
> > > > > > > > > > > > >> > > > >> > the
> > > > > > > > > > > > >> > > > >> > > > > > client
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> handles the
> > > > > "FailDuetoThrottled"
> > > > > > > > > > > > appropriately.
> > > > > > > > > > > > >> > For
> > > > > > > > > > > > >> > > > any
> > > > > > > > > > > > >> > > > >> > > malicious
> > > > > > > > > > > > >> > > > >> > > > > > > > clients
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> that, for example, just
> > > keep
> > > > > > > retrying
> > > > > > > > > > > either
> > > > > > > > > > > > >> > > > >> intentionally
> > > > > > > > > > > > >> > > > >> > or
> > > > > > > > > > > > >> > > > >> > > > > not,
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> neither
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> of these approaches are
> > > > > actually
> > > > > > > > > > effective.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> 2. For "OKButThrottled"
> and
> > > > > > > > > > > > "FailDuetoThrottled"
> > > > > > > > > > > > >> > > > >> responses,
> > > > > > > > > > > > >> > > > >> > > shall
> > > > > > > > > > > > >> > > > >> > > > > > we
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> encode
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> them as error codes or
> > > > augment
> > > > > the
> > > > > > > > > > protocol
> > > > > > > > > > > > to
> > > > > > > > > > > > >> > use a
> > > > > > > > > > > > >> > > > >> > separate
> > > > > > > > > > > > >> > > > >> > > > > field
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> indicating "status
> codes".
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> Today we have already
> > > > > incorporated
> > > > > > > > some
> > > > > > > > > > > > status
> > > > > > > > > > > > >> > code
> > > > > > > > > > > > >> > > as
> > > > > > > > > > > > >> > > > >> > error
> > > > > > > > > > > > >> > > > >> > > > > codes
> > > > > > > > > > > > >> > > > >> > > > > > in
> > > > > > > > > > > > >> > > > >> > > > > > > > the
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> responses, e.g.
> > > > > > > ReplicaNotAvailable
> > > > > > > > in
> > > > > > > > > > > > >> > > > MetadataResponse,
> > > > > > > > > > > > >> > > > >> > the
> > > > > > > > > > > > >> > > > >> > > pros
> > > > > > > > > > > > >> > > > >> > > > > > of
> > > > > > > > > > > > >> > > > >> > > > > > > > this
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> is of course using a
> single
> > > > > field
> > > > > > > for
> > > > > > > > > > > > response
> > > > > > > > > > > > >> > > status
> > > > > > > > > > > > >> > > > >> like
> > > > > > > > > > > > >> > > > >> > > the
> > > > > > > > > > > > >> > > > >> > > > > HTTP
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> status
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> codes, while the cons
> is
> > > that
> > > > > it
> > > > > > > > > requires
> > > > > > > > > > > > >> clients
> > > > > > > > > > > > >> > to
> > > > > > > > > > > > >> > > > >> handle
> > > > > > > > > > > > >> > > > >> > > the
> > > > > > > > > > > > >> > > > >> > > > > > error
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> codes
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> carefully.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> I think maybe we can
> > > actually
> > > > > > > extend
> > > > > > > > > the
> > > > > > > > > > > > >> > single-code
> > > > > > > > > > > > >> > > > >> > > approach to
> > > > > > > > > > > > >> > > > >> > > > > > > > overcome
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> its drawbacks, that is,
> > > wrap
> > > > > the
> > > > > > > > error
> > > > > > > > > > > codes
> > > > > > > > > > > > >> > > semantics
> > > > > > > > > > > > >> > > > >> to
> > > > > > > > > > > > >> > > > >> > the
> > > > > > > > > > > > >> > > > >> > > > > users
> > > > > > > > > > > > >> > > > >> > > > > > > so
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> that
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> users do not need to
> handle
> > > > the
> > > > > > > codes
> > > > > > > > > > > > >> one-by-one.
> > > > > > > > > > > > >> > > More
> > > > > > > > > > > > >> > > > >> > > > > concretely,
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> following Jay's example
> the
> > > > > client
> > > > > > > > > could
> > > > > > > > > > > > write
> > > > > > > > > > > > >> > sth.
> > > > > > > > > > > > >> > > > like
> > > > > > > > > > > > >> > > > >> > > this:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> if(error.isOK())
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> // status code is good
> or
> > > the
> > > > > > > > code
> > > > > > > > > > can
> > > > > > > > > > > > be
> > > > > > > > > > > > >> > > simply
> > > > > > > > > > > > >> > > > >> > > ignored for
> > > > > > > > > > > > >> > > > >> > > > > > > this
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> request type, process
> the
> > > > > request
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> else if
> (error.needsRetry())
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> // throttled, transient
> > > > error,
> > > > > > > > > etc:
> > > > > > > > > > > > retry
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> else if(error.isFatal
> ())
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> // non-retriable
> errors,
> > > etc:
> > > > > > > > > > notify /
> > > > > > > > > > > > >> > > terminate
> > > > > > > > > > > > >> > > > /
> > > > > > > > > > > > >> > > > >> > other
> > > > > > > > > > > > >> > > > >> > > > > > > handling
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> Only when the clients
> > > really
> > > > > want
> > > > > > > to
> > > > > > > > > > > handle,
> > > > > > > > > > > > for
> > > > > > > > > > > > >> > > > example
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> FailDuetoThrottled
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> status code
> specifically,
> > > it
> > > > > needs
> > > > > > > > to:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> if(error.isOK())
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> // status code is good
> or
> > > the
> > > > > > > > code
> > > > > > > > > > can
> > > > > > > > > > > > be
> > > > > > > > > > > > >> > > simply
> > > > > > > > > > > > >> > > > >> > > ignored for
> > > > > > > > > > > > >> > > > >> > > > > > > this
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> request type, process
> the
> > > > > request
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> else if(error ==
> > > > > > > > FailDuetoThrottled )
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> // throttled: log it
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> else if
> (error.needsRetry())
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> // transient error,
> etc:
> > > > retry
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> else if(error.isFatal
> ())
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> // non-retriable
> errors,
> > > etc:
> > > > > > > > > > notify /
> > > > > > > > > > > > >> > > terminate
> > > > > > > > > > > > >> > > > /
> > > > > > > > > > > > >> > > > >> > other
> > > > > > > > > > > > >> > > > >> > > > > > > handling
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> And for implementation
> we
> > > can
> > > > > > > > probably
> > > > > > > > > > > group
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > > codes
> > > > > > > > > > > > >> > > > >> > > > > accordingly
> > > > > > > > > > > > >> > > > >> > > > > > > like
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> HTTP status code such
> that
> > > we
> > > > > can
> > > > > > > do:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> boolean Error.isOK() {
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> return code < 300 &&
> code
> > > >=
> > > > > 200;
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> }
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> Guozhang
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> On Mon, Mar 16, 2015 at
> > > 10:24
> > > > > PM,
> > > > > > > > Ewen
> > > > > > > > > > > > >> > > > Cheslack-Postava
> > > > > > > > > > > > >> > > > >> <
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> e...@confluent.io>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > Agreed that trying to
> > > > > shoehorn
> > > > > > > > > > non-error
> > > > > > > > > > > > codes
> > > > > > > > > > > > >> > > into
> > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > >> > > > >> > > error
> > > > > > > > > > > > >> > > > >> > > > > > field
> > > > > > > > > > > > >> > > > >> > > > > > > > is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> a
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > bad idea. It makes it
> > > *way*
> > > > > too
> > > > > > > > easy
> > > > > > > > > to
> > > > > > > > > > > > write
> > > > > > > > > > > > >> > code
> > > > > > > > > > > > >> > > > >> that
> > > > > > > > > > > > >> > > > >> > > looks
> > > > > > > > > > > > >> > > > >> > > > > > (and
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> should
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > be) correct but is
> > > actually
> > > > > > > > > incorrect.
> > > > > > > > > > If
> > > > > > > > > > > > >> > > > necessary, I
> > > > > > > > > > > > >> > > > >> > > think
> > > > > > > > > > > > >> > > > >> > > > > it's
> > > > > > > > > > > > >> > > > >> > > > > > > > much
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > better to to spend a
> > > couple
> > > > > of
> > > > > > > > extra
> > > > > > > > > > > bytes
> > > > > > > > > > > > to
> > > > > > > > > > > > >> > > encode
> > > > > > > > > > > > >> > > > >> that
> > > > > > > > > > > > >> > > > >> > > > > > > information
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > separately (a
> "status" or
> > > > > > > "warning"
> > > > > > > > > > > > section of
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > >> > > response).
> > > > > > > > > > > > >> > > > >> > > > > An
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> indication
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > that throttling is
> > > > occurring
> > > > > is
> > > > > > > > > > something
> > > > > > > > > > > > I'd
> > > > > > > > > > > > >> > > expect
> > > > > > > > > > > > >> > > > >> to
> > > > > > > > > > > > >> > > > >> > be
> > > > > > > > > > > > >> > > > >> > > > > > > indicated
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> by a
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > bit flag in the
> response
> > > > > rather
> > > > > > > > than
> > > > > > > > > as
> > > > > > > > > > > an
> > > > > > > > > > > > >> error
> > > > > > > > > > > > >> > > > code.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > Gwen - I think an
> error
> > > > code
> > > > > > > makes
> > > > > > > > > > sense
> > > > > > > > > > > > when
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > >> request
> > > > > > > > > > > > >> > > > >> > > > > > actually
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> failed.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > Option B, which Jun
> was
> > > > > > > advocating,
> > > > > > > > > > would
> > > > > > > > > > > > have
> > > > > > > > > > > > >> > > > >> appended
> > > > > > > > > > > > >> > > > >> > the
> > > > > > > > > > > > >> > > > >> > > > > > > messages
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > successfully. If the
> > > > > > > rate-limiting
> > > > > > > > > case
> > > > > > > > > > > > you're
> > > > > > > > > > > > >> > > > talking
> > > > > > > > > > > > >> > > > >> > > about
> > > > > > > > > > > > >> > > > >> > > > > had
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > successfully
> committed
> > > the
> > > > > > > > messages,
> > > > > > > > > I
> > > > > > > > > > > > would
> > > > > > > > > > > > >> say
> > > > > > > > > > > > >> > > > >> that's
> > > > > > > > > > > > >> > > > >> > > also a
> > > > > > > > > > > > >> > > > >> > > > > > bad
> > > > > > > > > > > > >> > > > >> > > > > > > > use
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> of
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > error codes.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > On Mon, Mar 16, 2015
> at
> > > > 10:16
> > > > > > > PM,
> > > > > > > > > Gwen
> > > > > > > > > > > > >> Shapira <
> > > > > > > > > > > > >> > > > >> > > > > > > > gshap...@cloudera.com>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > We discussed an
> error
> > > > code
> > > > > for
> > > > > > > > > > > > rate-limiting
> > > > > > > > > > > > >> > > > (which
> > > > > > > > > > > > >> > > > >> I
> > > > > > > > > > > > >> > > > >> > > think
> > > > > > > > > > > > >> > > > >> > > > > > made
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > sense), isn't it a
> > > > similar
> > > > > > > case?
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > On Mon, Mar 16,
> 2015 at
> > > > > 10:10
> > > > > > > PM,
> > > > > > > > > Jay
> > > > > > > > > > > > Kreps
> > > > > > > > > > > > >> <
> > > > > > > > > > > > >> > > > >> > > > > > jay.kr...@gmail.com
> > > > > > > > > > > > >> > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > My concern is
> that as
> > > > > soon
> > > > > > > as
> > > > > > > > you
> > > > > > > > > > > start
> > > > > > > > > > > > >> > > encoding
> > > > > > > > > > > > >> > > > >> > > non-error
> > > > > > > > > > > > >> > > > >> > > > > > > > response
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > information into
> > > error
> > > > > codes
> > > > > > > > the
> > > > > > > > > > next
> > > > > > > > > > > > >> > question
> > > > > > > > > > > > >> > > > is
> > > > > > > > > > > > >> > > > >> > what
> > > > > > > > > > > > >> > > > >> > > to
> > > > > > > > > > > > >> > > > >> > > > > do
> > > > > > > > > > > > >> > > > >> > > > > > if
> > > > > > > > > > > > >> > > > >> > > > > > > > two
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > such
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > codes apply (i.e.
> you
> > > > > have a
> > > > > > > > > > replica
> > > > > > > > > > > > down
> > > > > > > > > > > > >> > and
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > >> > > response
> > > > > > > > > > > > >> > > > >> > > > > is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > quota'd). I
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > think I am trying
> to
> > > > > argue
> > > > > > > that
> > > > > > > > > > error
> > > > > > > > > > > > >> should
> > > > > > > > > > > > >> > > > mean
> > > > > > > > > > > > >> > > > >> > "why
> > > > > > > > > > > > >> > > > >> > > we
> > > > > > > > > > > > >> > > > >> > > > > > > failed
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> your
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > request", for
> which
> > > > there
> > > > > > > will
> > > > > > > > > > really
> > > > > > > > > > > > only
> > > > > > > > > > > > >> > be
> > > > > > > > > > > > >> > > > one
> > > > > > > > > > > > >> > > > >> > > reason,
> > > > > > > > > > > > >> > > > >> > > > > and
> > > > > > > > > > > > >> > > > >> > > > > > > any
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> other
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > useful
> information we
> > > > > want
> > > > > > > to
> > > > > > > > > send
> > > > > > > > > > > > back is
> > > > > > > > > > > > >> > > just
> > > > > > > > > > > > >> > > > >> > another
> > > > > > > > > > > > >> > > > >> > > > > field
> > > > > > > > > > > > >> > > > >> > > > > > > in
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> the
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > response.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > -Jay
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > > On Mon, Mar 16,
> 2015
> > > at
> > > > > 9:51
> > > > > > > > PM,
> > > > > > > > > > Gwen
> > > > > > > > > > > > >> > Shapira
> > > > > > > > > > > > >> > > <
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> gshap...@cloudera.com>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> I think its not
> too
> > > > > late to
> > > > > > > > > > reserve
> > > > > > > > > > > a
> > > > > > > > > > > > set
> > > > > > > > > > > > >> > of
> > > > > > > > > > > > >> > > > >> error
> > > > > > > > > > > > >> > > > >> > > codes
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> (200-299?)
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> for "non-error"
> > > codes.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> It won't be
> backward
> > > > > > > > compatible
> > > > > > > > > > > (i.e.
> > > > > > > > > > > > >> > clients
> > > > > > > > > > > > >> > > > >> that
> > > > > > > > > > > > >> > > > >> > > > > currently
> > > > > > > > > > > > >> > > > >> > > > > > > do
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> "else
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> throw" will
> throw on
> > > > > > > > > non-errors),
> > > > > > > > > > > but
> > > > > > > > > > > > >> > perhaps
> > > > > > > > > > > > >> > > > its
> > > > > > > > > > > > >> > > > >> > > > > > worthwhile.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> On Mon, Mar 16,
> 2015
> > > > at
> > > > > > > 9:42
> > > > > > > > PM,
> > > > > > > > > > Jay
> > > > > > > > > > > > >> Kreps
> > > > > > > > > > > > >> > <
> > > > > > > > > > > > >> > > > >> > > > > > > jay.kr...@gmail.com
> > > > > > > > > > > > >> > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > Hey Jun,
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > I'd really
> really
> > > > > really
> > > > > > > > like
> > > > > > > > > to
> > > > > > > > > > > > avoid
> > > > > > > > > > > > >> > > that.
> > > > > > > > > > > > >> > > > >> > Having
> > > > > > > > > > > > >> > > > >> > > just
> > > > > > > > > > > > >> > > > >> > > > > > > > spent a
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > bunch of
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > time on the
> > > clients,
> > > > > > > using
> > > > > > > > the
> > > > > > > > > > > error
> > > > > > > > > > > > >> > codes
> > > > > > > > > > > > >> > > to
> > > > > > > > > > > > >> > > > >> > encode
> > > > > > > > > > > > >> > > > >> > > > > other
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > information
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > about the
> response
> > > > is
> > > > > > > super
> > > > > > > > > > > > dangerous.
> > > > > > > > > > > > >> > The
> > > > > > > > > > > > >> > > > >> error
> > > > > > > > > > > > >> > > > >> > > > > handling
> > > > > > > > > > > > >> > > > >> > > > > > is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> one of
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > the
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > hardest parts
> of
> > > the
> > > > > > > client
> > > > > > > > > > > > (Guozhang
> > > > > > > > > > > > >> > chime
> > > > > > > > > > > > >> > > > in
> > > > > > > > > > > > >> > > > >> > > here).
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > Generally the
> > > error
> > > > > > > handling
> > > > > > > > > > looks
> > > > > > > > > > > > like
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > if(error ==
> none)
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > // good,
> process
> > > the
> > > > > > > > > > request
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > else if(error
> ==
> > > > > > > > > > KNOWN_ERROR_1)
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > // handle
> known
> > > > error
> > > > > 1
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > else if(error
> ==
> > > > > > > > > > KNOWN_ERROR_2)
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > // handle
> known
> > > > error
> > > > > 2
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > else
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > throw
> > > > > > > > > > > > >> > > Errors.forCode(error).exception();
> > > > > > > > > > > > >> > > > >> //
> > > > > > > > > > > > >> > > > >> > or
> > > > > > > > > > > > >> > > > >> > > some
> > > > > > > > > > > > >> > > > >> > > > > > > other
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > default
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > behavior
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > This works
> because
> > > > we
> > > > > > > have a
> > > > > > > > > > > > convention
> > > > > > > > > > > > >> > > that
> > > > > > > > > > > > >> > > > >> and
> > > > > > > > > > > > >> > > > >> > > error
> > > > > > > > > > > > >> > > > >> > > > > is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> something
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > that
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > prevented your
> > > > getting
> > > > > > > the
> > > > > > > > > > > response
> > > > > > > > > > > > so
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > >> default
> > > > > > > > > > > > >> > > > >> > > > > > handling
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> case is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > sane
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > and forward
> > > > > compatible.
> > > > > > > It
> > > > > > > > is
> > > > > > > > > > > > tempting
> > > > > > > > > > > > >> to
> > > > > > > > > > > > >> > > use
> > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > >> > > > >> > > error
> > > > > > > > > > > > >> > > > >> > > > > > code
> > > > > > > > > > > > >> > > > >> > > > > > > > to
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > convey
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > information in
> the
> > > > > > > success
> > > > > > > > > case.
> > > > > > > > > > > For
> > > > > > > > > > > > >> > > example
> > > > > > > > > > > > >> > > > we
> > > > > > > > > > > > >> > > > >> > > could
> > > > > > > > > > > > >> > > > >> > > > > use
> > > > > > > > > > > > >> > > > >> > > > > > > > error
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > codes
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > to
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > encode whether
> > > > quotas
> > > > > > > were
> > > > > > > > > > > enforced,
> > > > > > > > > > > > >> > > whether
> > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > >> > > > >> > > request
> > > > > > > > > > > > >> > > > >> > > > > > was
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> served
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > out
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> of
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > cache, whether
> the
> > > > > stock
> > > > > > > > > market
> > > > > > > > > > is
> > > > > > > > > > > > up
> > > > > > > > > > > > >> > > today,
> > > > > > > > > > > > >> > > > or
> > > > > > > > > > > > >> > > > >> > > > > whatever.
> > > > > > > > > > > > >> > > > >> > > > > > > The
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > problem
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > that since
> these
> > > are
> > > > > not
> > > > > > > > > errors
> > > > > > > > > > as
> > > > > > > > > > > > far
> > > > > > > > > > > > >> as
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > >> > > client is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> concerned it
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> should
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > not throw an
> > > > exception
> > > > > > > but
> > > > > > > > > > process
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > > > >> response,
> > > > > > > > > > > > >> > > > >> > > but now
> > > > > > > > > > > > >> > > > >> > > > > > we
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> created
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > an
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > explicit
> > > requirement
> > > > > that
> > > > > > > > that
> > > > > > > > > > > > error be
> > > > > > > > > > > > >> > > > handled
> > > > > > > > > > > > >> > > > >> > > > > explicitly
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> since it
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > is
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > different. I
> > > really
> > > > > think
> > > > > > > > that
> > > > > > > > > > > this
> > > > > > > > > > > > >> kind
> > > > > > > > > > > > >> > of
> > > > > > > > > > > > >> > > > >> > > information
> > > > > > > > > > > > >> > > > >> > > > > is
> > > > > > > > > > > > >> > > > >> > > > > > > not
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> an
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > error,
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> it
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > is just
> > > information,
> > > > > and
> > > > > > > if
> > > > > > > > we
> > > > > > > > > > > want
> > > > > > > > > > > > it
> > > > > > > > > > > > >> in
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > >> > > response
> > > > > > > > > > > > >> > > > >> > > > > we
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> should do
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > the
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > right thing
> and
> > > add
> > > > a
> > > > > new
> > > > > > > > > field
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > > > >> response.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > I think you
> saw
> > > the
> > > > > Samza
> > > > > > > > bug
> > > > > > > > > > that
> > > > > > > > > > > > was
> > > > > > > > > > > > >> > > > >> literally
> > > > > > > > > > > > >> > > > >> > an
> > > > > > > > > > > > >> > > > >> > > > > > example
> > > > > > > > > > > > >> > > > >> > > > > > > of
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> this
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > happening and
> > > > leading
> > > > > to
> > > > > > > an
> > > > > > > > > > > infinite
> > > > > > > > > > > > >> > retry
> > > > > > > > > > > > >> > > > >> loop.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > Further more I
> > > > really
> > > > > > > want
> > > > > > > > to
> > > > > > > > > > > > emphasize
> > > > > > > > > > > > >> > > that
> > > > > > > > > > > > >> > > > >> > hitting
> > > > > > > > > > > > >> > > > >> > > > > your
> > > > > > > > > > > > >> > > > >> > > > > > > > quota
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> in
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > the
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > design that
> Adi
> > > has
> > > > > > > proposed
> > > > > > > > > is
> > > > > > > > > > > > >> actually
> > > > > > > > > > > > >> > > not
> > > > > > > > > > > > >> > > > an
> > > > > > > > > > > > >> > > > >> > > error
> > > > > > > > > > > > >> > > > >> > > > > > > > condition
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> at
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > all.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> It
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > is totally
> > > > reasonable
> > > > > in
> > > > > > > any
> > > > > > > > > > > > bootstrap
> > > > > > > > > > > > >> > > > >> situation
> > > > > > > > > > > > >> > > > >> > to
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> intentionally
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > want to
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > run at the
> limit
> > > the
> > > > > > > system
> > > > > > > > > > > imposes
> > > > > > > > > > > > on
> > > > > > > > > > > > >> > you.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > -Jay
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> > On Mon, Mar
> 16,
> > > 2015
> > > > > at
> > > > > > > 4:27
> > > > > > > > > PM,
> > > > > > > > > > > Jun
> > > > > > > > > > > > >> Rao
> > > > > > > > > > > > >> > <
> > > > > > > > > > > > >> > > > >> > > > > > j...@confluent.io>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> It's probably
> > > > useful
> > > > > for
> > > > > > > a
> > > > > > > > > > client
> > > > > > > > > > > > to
> > > > > > > > > > > > >> > know
> > > > > > > > > > > > >> > > > >> whether
> > > > > > > > > > > > >> > > > >> > > its
> > > > > > > > > > > > >> > > > >> > > > > > > > requests
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> are
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> throttled or
> not
> > > > > (e.g.,
> > > > > > > for
> > > > > > > > > > > > monitoring
> > > > > > > > > > > > >> > and
> > > > > > > > > > > > >> > > > >> > > alerting).
> > > > > > > > > > > > >> > > > >> > > > > > From
> > > > > > > > > > > > >> > > > >> > > > > > > > that
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> perspective,
> > > > option B
> > > > > > > > (delay
> > > > > > > > > > the
> > > > > > > > > > > > >> > requests
> > > > > > > > > > > > >> > > > and
> > > > > > > > > > > > >> > > > >> > > return an
> > > > > > > > > > > > >> > > > >> > > > > > > > error)
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > seems
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> better.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> Thanks,
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> Jun
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> On Wed, Mar
> 4,
> > > 2015
> > > > > at
> > > > > > > 3:51
> > > > > > > > > PM,
> > > > > > > > > > > > Aditya
> > > > > > > > > > > > >> > > > >> Auradkar <
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > > > aaurad...@linkedin.com.invalid
> > > > > > > > > > >
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Posted a
> KIP
> > > for
> > > > > > > quotas
> > > > > > > > in
> > > > > > > > > > > kafka.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13
> +-+Quotas
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Appreciate
> any
> > > > > > > feedback.
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Aditya
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > --
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > Thanks,
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> > Ewen
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> --
> > > > > > > > > > > > >> > > > >> > > > > > > > >>> -- Guozhang
> > > > > > > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > > > > > > >> > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > > >
> > > > > > > > > > > > >> > > > >> > > > > >
> > > > > > > > > > > > >> > > > >> > > > >
> > > > > > > > > > > > >> > > > >> > >
> > > > > > > > > > > > >> > > > >> > >
> > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > --
> > > > > > > > > > > > >> > > > > Sent from Gmail Mobile
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > >> > > > Sent from Gmail Mobile
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> 


Reply via email to