see some response inline below. Tong Li OpenStack & Kafka Community Development Building 501/B205 liton...@us.ibm.com
Jay Kreps <jay.kr...@gmail.com> wrote on 04/07/2015 10:41:19 AM: > From: Jay Kreps <jay.kr...@gmail.com> > To: "dev@kafka.apache.org" <dev@kafka.apache.org> > Date: 04/07/2015 10:44 AM > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas > > Totally. But is that the only use? What I wanted to flesh out was whether > the goal was: > 1. Expose throttling in the client metrics > 2. Enable programmatic response (i.e. stop sending stuff or something like > that) > > I think I kind of understand (1) but let's get specific on the metric we > would be adding and what exactly you would expose in a dashboard. For > example if the goal is just monitoring do I really want a boolean flag for > is_throttled or do I want to know how much I am being throttled (i.e. > throttle_pct might indicate the percent of your request time that was due > to throttling or something like that)? If I am 1% throttled that may be > irrelevant but 99% throttled would be quite relevant? Not sure I agree, > just throwing that out there... > Jay, great point, I think Kafka should really just sent metrics, how to judge if a system is throttled should be someone other people's job. I would think this comes down to design principles, if we follow the principal of "separation of the concerns", then this should not be really part of Kafka. I have been doing monitoring systems for awhile, the system being monitored normally just send the fact of itself, such as CPU usage, network usage, disk usage etc to the monitoring system, the monitoring system will run various algorithms to eventually decide if a system is throttled by setting up threshold and other measures. The monitoring system will also send out notifications/alarms if things turns bad. Just to make this discussion even easier, a set of general purpose of agents collecting these data have been developed and available as part of a monitoring system named Monasca. If you are interested, I can provide more information. For Kafka to have the features such as judging if the system is throttling seems to be a moving-away from its core values. Just my 2 cents of course. > For (2) the prior discussion seemed to kind of allude to this but I can't > really come up with a use case. Is there one? > > If it is just (1) I think the question is whether it really helps much to > have the metric on the client vs the server. I suppose this is a bit > environment specific. If you have a central metrics system it shouldn't > make any difference, but if you don't I suppose it does. > > -Jay > > On Mon, Apr 6, 2015 at 7:57 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > > > Here's a wild guess: > > > > An app developer included a Kafka Producer in his app, and is not happy > > with the throughput. He doesn't have visibility into the brokers since they > > are owned by a different team. Obviously the first instinct of a developer > > who knows that throttling exists is to blame throttling for any slowdown in > > the app. > > If he doesn't have a way to know from the responses whether or not his app > > is throttled, he may end up calling Aditya at 4am asked "Hey, is my app > > throttled?". > > > > I assume Aditya is trying to avoid this scenario. > > > > On Mon, Apr 6, 2015 at 7:47 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > > > Hey Aditya, > > > > > > 2. I kind of buy it, but I really like to understand the details of the > > use > > > case before we make protocol changes. What changes are you proposing in > > the > > > clients for monitoring and how would that be used? > > > > > > -Jay > > > > > > On Mon, Apr 6, 2015 at 10:36 AM, Aditya Auradkar < > > > aaurad...@linkedin.com.invalid> wrote: > > > > > > > Hi Jay, > > > > > > > > 2. At this time, the proposed response format changes are only for > > > > monitoring/informing clients. As Jun mentioned, we get instance level > > > > monitoring in this case since each instance that got throttled will > > have > > > a > > > > metric confirming the same. Without client level monitoring for this, > > > it's > > > > hard for application developers to find if they are being throttled > > since > > > > they will also have to be aware of all the brokers in the cluster. This > > > is > > > > quite problematic for large clusters. > > > > > > > > It seems nice for app developers to not have to think about kafka > > > internal > > > > metrics and only focus on the metrics exposed on their instances. > > > Analogous > > > > to having client-sde request latency metrics. Basically, we want an > > easy > > > > way for clients to be aware if they are being throttled. > > > > > > > > 4. For purgatory v delay queue, I think we are on the same page. I feel > > > it > > > > is nicer to use the purgatory but I'm happy to use a DelayQueue if > > there > > > > are performance implications. I don't know enough about the current and > > > > Yasuhiro's new implementation to be sure one way or the other. > > > > > > > > Stepping back, I think these two things are the only remaining point of > > > > discussion within the current proposal. Any concerns if I started a > > > voting > > > > thread on the proposal after the KIP discussion tomorrow? (assuming we > > > > reach consensus on these items) > > > > > > > > Thanks, > > > > Aditya > > > > ________________________________________ > > > > From: Jay Kreps [jay.kr...@gmail.com] > > > > Sent: Saturday, April 04, 2015 1:36 PM > > > > To: dev@kafka.apache.org > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas > > > > > > > > Hey Aditya, > > > > > > > > 2. For the return flag I'm not terribly particular. If we want to add > > it > > > > let's fully think through how it will be used. The only concern I have > > is > > > > adding to the protocol without really thinking through the use cases. > > So > > > > let's work out the APIs we want to add to the Java consumer and > > producer > > > > and the use cases for how clients will make use of these. For my part I > > > > actually don't see much use other than monitoring since it isn't an > > error > > > > condition to be at your quota. And if it is just monitoring I don't > > see a > > > > big enough difference between having the monitoring on the server-side > > > > versus in the clients to justify putting it in the protocol. But I > > think > > > > you guys may have other use cases in mind of how a client would make > > some > > > > use of this? Let's work that out. I also don't feel strongly about > > it--it > > > > wouldn't be *bad* to have the monitoring available on the client, just > > > > doesn't seem that much better. > > > > > > > > 4. For the purgatory vs delay queue I think is arguably nicer to reuse > > > the > > > > purgatory we just have to be ultra-conscious of efficiency. I think our > > > > goal is to turn quotas on across the board, so at LinkedIn that would > > > mean > > > > potentially every request will need a small delay. I haven't worked out > > > the > > > > efficiency implications of this choice, so as long as we do that I'm > > > happy. > > > > > > > > -Jay > > > > > > > > On Fri, Apr 3, 2015 at 1:10 PM, Aditya Auradkar < > > > > aaurad...@linkedin.com.invalid> wrote: > > > > > > > > > Some responses to Jay's points. > > > > > > > > > > 1. Using commas - Cool. > > > > > > > > > > 2. Adding return flag - I'm inclined to agree with Joel that this is > > > good > > > > > to have in the initial implementation. > > > > > > > > > > 3. Config - +1. I'll remove it from the KIP. We can discuss this in > > > > > parallel. > > > > > > > > > > 4. Purgatory vs Delay queue - I feel that it is simpler to reuse the > > > > > existing purgatories for both delayed produce and fetch requests. > > IIUC, > > > > all > > > > > we need for quotas is a minWait parameter for DelayedOperation (or > > > > > something equivalent) since there is already a max wait. The > > completion > > > > > criteria can check if minWait time has elapsed before declaring the > > > > > operation complete. For this to impact performance, a significant > > > number > > > > of > > > > > clients may need to exceed their quota at the same time and even then > > > I'm > > > > > not very clear on the scope of the impact. Two layers of delays might > > > add > > > > > complexity to the implementation which I'm hoping to avoid. > > > > > > > > > > Aditya > > > > > > > > > > ________________________________________ > > > > > From: Joel Koshy [jjkosh...@gmail.com] > > > > > Sent: Friday, April 03, 2015 12:48 PM > > > > > To: dev@kafka.apache.org > > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas > > > > > > > > > > Aditya, thanks for the updated KIP and Jay/Jun thanks for the > > > > > comments. Couple of comments in-line: > > > > > > > > > > > 2. I would advocate for adding the return flag when we next bump > > the > > > > > > request format version just to avoid proliferation. I agree this > > is a > > > > > good > > > > > > thing to know about, but at the moment I don't think we have a very > > > > well > > > > > > flushed out idea of how the client would actually make use of this > > > > info. > > > > > I > > > > > > > > > > I'm somewhat inclined to having something appropriate off the bat - > > > > > mainly because (i) clients really should know that they have been > > > > > throttled (ii) a smart producer/consumer implementation would want to > > > > > know how much to back off. So perhaps this and config-management > > > > > should be moved to a separate discussion, but it would be good to > > have > > > > > this discussion going and incorporated into the first quota > > > > > implementation. > > > > > > > > > > > 3. Config--I think we need to generalize the topic stuff so we can > > > > > override > > > > > > at multiple levels. We have topic and client, but I suspect "user" > > > and > > > > > > "broker" will also be important. I recommend we take config stuff > > out > > > > of > > > > > > this KIP since we really need to fully think through a proposal > > that > > > > will > > > > > > cover all these types of overrides. > > > > > > > > > > +1 - it is definitely orthogonal to the core quota implementation > > > > > (although necessary for its operability). Having a config-related > > > > > discussion in this KIP would only draw out the discussion and vote > > > > > even if the core quota design looks good to everyone. > > > > > > > > > > So basically I think we can remove the portions on dynamic config as > > > > > well as the response format but I really think we should close on > > > > > those while the implementation is in progress and before quotas is > > > > > officially released. > > > > > > > > > > > 4. Instead of using purgatories to implement the delay would it > > make > > > > more > > > > > > sense to just use a delay queue? I think all the additional stuff > > in > > > > the > > > > > > purgatory other than the delay queue doesn't make sense as the > > quota > > > > is a > > > > > > hard N ms penalty with no chance of early eviction. If there is no > > > perf > > > > > > penalty for the full purgatory that may be fine (even good) to > > reuse, > > > > > but I > > > > > > haven't looked into that. > > > > > > > > > > A simple delay queue sounds good - I think Aditya was also trying to > > > > > avoid adding a new quota purgatory. i.e., it may be possible to use > > > > > the existing purgatory instances to enforce quotas. That may be > > > > > simpler, but would be incur a slight perf penalty if too many clients > > > > > are being throttled. > > > > > > > > > > Thanks, > > > > > > > > > > Joel > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > On Fri, Apr 3, 2015 at 10:45 AM, Aditya Auradkar < > > > > > > aaurad...@linkedin.com.invalid> wrote: > > > > > > > > > > > >> Update, I added a proposal on doing dynamic client based > > > configuration > > > > > >> that can be used for quotas. > > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-13 +-+Quotas > > > > > >> > > > > > >> Please take a look and let me know if there are any concerns. > > > > > >> > > > > > >> Thanks, > > > > > >> Aditya > > > > > >> ________________________________________ > > > > > >> From: Aditya Auradkar > > > > > >> Sent: Friday, April 03, 2015 10:10 AM > > > > > >> To: dev@kafka.apache.org > > > > > >> Subject: RE: [KIP-DISCUSSION] KIP-13 Quotas > > > > > >> > > > > > >> Thanks Jun. > > > > > >> > > > > > >> Some thoughts: > > > > > >> > > > > > >> 10) I think it is better we throttle regardless of the > > produce/fetch > > > > > >> version. This is a nice feature where clients can tell if they are > > > > being > > > > > >> throttled or not. If we only throttle newer clients, then we have > > > > > >> inconsistent behavior across clients in a multi-tenant cluster. > > > Having > > > > > >> quota metrics on the client side is also a nice incentive to > > upgrade > > > > > client > > > > > >> versions. > > > > > >> > > > > > >> 11) I think we can call metric.record(fetchSize) before adding the > > > > > >> delayedFetch request into the purgatory. This will give us the > > > > estimated > > > > > >> delay of the request up-front. The timeout on the DelayedFetch is > > > the > > > > > >> Max(maxWait, quotaDelay). The DelayedFetch completion criteria can > > > > > change a > > > > > >> little to accomodate quotas. > > > > > >> > > > > > >> - I agree the quota code should return the estimated delay time in > > > > > >> QuotaViolationException. > > > > > >> > > > > > >> Thanks, > > > > > >> Aditya > > > > > >> > > > > > >> ________________________________________ > > > > > >> From: Jun Rao [j...@confluent.io] > > > > > >> Sent: Friday, April 03, 2015 9:16 AM > > > > > >> To: dev@kafka.apache.org > > > > > >> Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas > > > > > >> > > > > > >> Thanks for the update. > > > > > >> > > > > > >> 10. About whether to return a new field in the response to > > indicate > > > > > >> throttling. Earlier, the plan was to not change the response > > format > > > > and > > > > > >> just have a metric on the broker to indicate whether a clientId is > > > > > >> throttled or not. The issue is that we don't know whether a > > > particular > > > > > >> clientId instance is throttled or not (since there could be > > multiple > > > > > >> clients with the same clientId). Your proposal of adding an > > > > isThrottled > > > > > >> field in the response addresses and seems better. Then, do we just > > > > > throttle > > > > > >> the new version of produce/fetch request or both the old and the > > new > > > > > >> versions? Also, we probably still need a separate metric on the > > > broker > > > > > side > > > > > >> to indicate whether a clientId is throttled or not. > > > > > >> > > > > > >> 11. Just to clarify. For fetch requests, when will > > > > > metric.record(fetchSize) > > > > > >> be called? Is it when we are ready to send the fetch response > > (after > > > > > >> minBytes and maxWait are satisfied)? > > > > > >> > > > > > >> As an implementation detail, it may be useful for the quota code > > to > > > > > return > > > > > >> an estimated delay time (to bring the measurement within the > > limit) > > > in > > > > > >> QuotaViolationException. > > > > > >> > > > > > >> Thanks, > > > > > >> > > > > > >> Jun > > > > > >> > > > > > >> On Wed, Apr 1, 2015 at 3:27 PM, Aditya Auradkar < > > > > > >> aaurad...@linkedin.com.invalid> wrote: > > > > > >> > > > > > >> > Hey everyone, > > > > > >> > > > > > > >> > I've made changes to the KIP to capture our discussions over the > > > > last > > > > > >> > couple of weeks. > > > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas > > > > > >> > > > > > > >> > I'll start a voting thread after people have had a chance to > > > > > >> read/comment. > > > > > >> > > > > > > >> > Thanks, > > > > > >> > Aditya > > > > > >> > > > > > > >> > ________________________________________ > > > > > >> > From: Steven Wu [stevenz...@gmail.com] > > > > > >> > Sent: Friday, March 20, 2015 9:14 AM > > > > > >> > To: dev@kafka.apache.org > > > > > >> > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas > > > > > >> > > > > > > >> > +1 on Jun's suggestion of maintaining one set/style of metrics > > at > > > > > broker. > > > > > >> > In Netflix, we have to convert the yammer metrics to servo > > metrics > > > > at > > > > > >> > broker. it will be painful to know some metrics are in a > > different > > > > > style > > > > > >> > and get to be handled differently. > > > > > >> > > > > > > >> > On Fri, Mar 20, 2015 at 8:17 AM, Jun Rao <j...@confluent.io> > > > wrote: > > > > > >> > > > > > > >> > > Not so sure. People who use quota will definitely want to > > > monitor > > > > > the > > > > > >> new > > > > > >> > > metrics at the client id level. Then they will need to deal > > with > > > > > those > > > > > >> > > metrics differently from the rest of the metrics. It would be > > > > > better if > > > > > >> > we > > > > > >> > > can hide this complexity from the users. > > > > > >> > > > > > > > >> > > Thanks, > > > > > >> > > > > > > > >> > > Jun > > > > > >> > > > > > > > >> > > On Thu, Mar 19, 2015 at 10:45 PM, Joel Koshy < > > > jjkosh...@gmail.com > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > > >> > > > Actually thinking again - since these will be a few new > > > metrics > > > > at > > > > > >> the > > > > > >> > > > client id level (bytes in and bytes out to start with) maybe > > > it > > > > is > > > > > >> fine > > > > > >> > > to > > > > > >> > > > have the two type of metrics coexist and we can migrate the > > > > > existing > > > > > >> > > > metrics in parallel. > > > > > >> > > > > > > > > >> > > > On Thursday, March 19, 2015, Joel Koshy < > > jjkosh...@gmail.com> > > > > > wrote: > > > > > >> > > > > > > > > >> > > > > That is a valid concern but in that case I think it would > > be > > > > > better > > > > > >> > to > > > > > >> > > > > just migrate completely to the new metrics package first. > > > > > >> > > > > > > > > > >> > > > > On Thursday, March 19, 2015, Jun Rao <j...@confluent.io > > > > > >> > > > > <javascript:_e(%7B%7D,'cvml','j...@confluent.io');>> > > wrote: > > > > > >> > > > > > > > > > >> > > > >> Hmm, I was thinking a bit differently on the metrics > > > stuff. I > > > > > >> think > > > > > >> > it > > > > > >> > > > >> would be confusing to have some metrics defined in the > > new > > > > > metrics > > > > > >> > > > package > > > > > >> > > > >> while some others defined in Coda Hale. Those metrics > > will > > > > look > > > > > >> > > > different > > > > > >> > > > >> (e.g., rates in Coda Hale will have special attributes > > such > > > > as > > > > > >> > > > >> 1-min-average). People may need different ways to export > > > the > > > > > >> metrics > > > > > >> > > to > > > > > >> > > > >> external systems such as Graphite. So, instead of using > > the > > > > new > > > > > >> > > metrics > > > > > >> > > > >> package on the broker, I was thinking that we can just > > > > > implement a > > > > > >> > > > >> QuotaMetrics that wraps the Coda Hale metrics. The > > > > > implementation > > > > > >> > can > > > > > >> > > be > > > > > >> > > > >> the same as what's in the new metrics package. > > > > > >> > > > >> > > > > > >> > > > >> Thanks, > > > > > >> > > > >> > > > > > >> > > > >> Jun > > > > > >> > > > >> > > > > > >> > > > >> On Thu, Mar 19, 2015 at 8:09 PM, Jay Kreps < > > > > > jay.kr...@gmail.com> > > > > > >> > > wrote: > > > > > >> > > > >> > > > > > >> > > > >> > Yeah I was saying was that we are blocked on picking an > > > > > approach > > > > > >> > for > > > > > >> > > > >> > metrics but not necessarily the full conversion. > > Clearly > > > if > > > > > we > > > > > >> > pick > > > > > >> > > > the > > > > > >> > > > >> new > > > > > >> > > > >> > metrics package we would need to implement the two > > > metrics > > > > we > > > > > >> want > > > > > >> > > to > > > > > >> > > > >> quota > > > > > >> > > > >> > on. But the conversion of the remaining metrics can be > > > done > > > > > >> > > > >> asynchronously. > > > > > >> > > > >> > > > > > > >> > > > >> > -Jay > > > > > >> > > > >> > > > > > > >> > > > >> > On Thu, Mar 19, 2015 at 5:56 PM, Joel Koshy < > > > > > >> jjkosh...@gmail.com> > > > > > >> > > > >> wrote: > > > > > >> > > > >> > > > > > > >> > > > >> > > > in KAFKA-1930). I agree that this KIP doesn't need > > to > > > > > block > > > > > >> on > > > > > >> > > the > > > > > >> > > > >> > > > migration of the metrics package. > > > > > >> > > > >> > > > > > > > >> > > > >> > > Can you clarify the above? i.e., if we are going to > > > quota > > > > > on > > > > > >> > > > something > > > > > >> > > > >> > > then we would want to have migrated that metric over > > > > > right? Or > > > > > >> > do > > > > > >> > > > you > > > > > >> > > > >> > > mean we don't need to complete the migration of all > > > > > metrics to > > > > > >> > the > > > > > >> > > > >> > > metrics package right? > > > > > >> > > > >> > > > > > > > >> > > > >> > > I think most of us now feel that the delay + no error > > > is > > > > a > > > > > >> good > > > > > >> > > > >> > > approach, but it would be good to make sure everyone > > is > > > > on > > > > > the > > > > > >> > > same > > > > > >> > > > >> > > page. > > > > > >> > > > >> > > > > > > > >> > > > >> > > As Aditya requested a couple of days ago I think we > > > > should > > > > > go > > > > > >> > over > > > > > >> > > > >> > > this at the next KIP hangout. > > > > > >> > > > >> > > > > > > > >> > > > >> > > Joel > > > > > >> > > > >> > > > > > > > >> > > > >> > > On Thu, Mar 19, 2015 at 09:24:09AM -0700, Jun Rao > > > wrote: > > > > > >> > > > >> > > > 1. Delay + no error seems reasonable to me. > > However, > > > I > > > > do > > > > > >> feel > > > > > >> > > > that > > > > > >> > > > >> we > > > > > >> > > > >> > > need > > > > > >> > > > >> > > > to give the client an indicator that it's being > > > > > throttled, > > > > > >> > > instead > > > > > >> > > > >> of > > > > > >> > > > >> > > doing > > > > > >> > > > >> > > > this silently. For that, we probably need to evolve > > > the > > > > > >> > > > >> produce/fetch > > > > > >> > > > >> > > > protocol to include an extra status field in the > > > > > response. > > > > > >> We > > > > > >> > > > >> probably > > > > > >> > > > >> > > need > > > > > >> > > > >> > > > to think more about whether we just want to return > > a > > > > > simple > > > > > >> > > status > > > > > >> > > > >> code > > > > > >> > > > >> > > > (e.g., 1 = throttled) or a value that indicates how > > > > much > > > > > is > > > > > >> > > being > > > > > >> > > > >> > > throttled. > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > 2. We probably need to improve the histogram > > support > > > in > > > > > the > > > > > >> > new > > > > > >> > > > >> metrics > > > > > >> > > > >> > > > package before we can use it more widely on the > > > server > > > > > side > > > > > >> > > (left > > > > > >> > > > a > > > > > >> > > > >> > > comment > > > > > >> > > > >> > > > in KAFKA-1930). I agree that this KIP doesn't need > > to > > > > > block > > > > > >> on > > > > > >> > > the > > > > > >> > > > >> > > > migration of the metrics package. > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > Thanks, > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > Jun > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > On Wed, Mar 18, 2015 at 4:02 PM, Aditya Auradkar < > > > > > >> > > > >> > > > aaurad...@linkedin.com.invalid> wrote: > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > Hey everyone, > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > Thanks for the great discussion. There are > > > currently > > > > a > > > > > few > > > > > >> > > > points > > > > > >> > > > >> on > > > > > >> > > > >> > > this > > > > > >> > > > >> > > > > KIP that need addressing and I want to make sure > > we > > > > > are on > > > > > >> > the > > > > > >> > > > >> same > > > > > >> > > > >> > > page > > > > > >> > > > >> > > > > about those. > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > 1. Append and delay response vs delay and return > > > > error > > > > > >> > > > >> > > > > - I think we've discussed the pros and cons of > > each > > > > > >> approach > > > > > >> > > but > > > > > >> > > > >> > > haven't > > > > > >> > > > >> > > > > chosen an approach yet. Where does everyone stand > > > on > > > > > this > > > > > >> > > issue? > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > 2. Metrics Migration and usage in quotas > > > > > >> > > > >> > > > > - The metrics library in clients has a notion of > > > > quotas > > > > > >> that > > > > > >> > > we > > > > > >> > > > >> > should > > > > > >> > > > >> > > > > reuse. For that to happen, we need to migrate the > > > > > server > > > > > >> to > > > > > >> > > the > > > > > >> > > > >> new > > > > > >> > > > >> > > metrics > > > > > >> > > > >> > > > > package. > > > > > >> > > > >> > > > > - Need more clarification on how to compute > > > > throttling > > > > > >> time > > > > > >> > > and > > > > > >> > > > >> > > windowing > > > > > >> > > > >> > > > > for quotas. > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > I'm going to start a new KIP to discuss metrics > > > > > migration > > > > > >> > > > >> separately. > > > > > >> > > > >> > > That > > > > > >> > > > >> > > > > will also contain a section on quotas. > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > 3. Dynamic Configuration management - Being > > > discussed > > > > > in > > > > > >> > > KIP-5. > > > > > >> > > > >> > > Basically > > > > > >> > > > >> > > > > we need something that will model default quotas > > > and > > > > > allow > > > > > >> > > > >> per-client > > > > > >> > > > >> > > > > overrides. > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > Is there something else that I'm missing? > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > Thanks, > > > > > >> > > > >> > > > > Aditya > > > > > >> > > > >> > > > > ________________________________________ > > > > > >> > > > >> > > > > From: Jay Kreps [jay.kr...@gmail.com] > > > > > >> > > > >> > > > > Sent: Wednesday, March 18, 2015 2:10 PM > > > > > >> > > > >> > > > > To: dev@kafka.apache.org > > > > > >> > > > >> > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > Hey Steven, > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > The current proposal is actually to enforce > > quotas > > > at > > > > > the > > > > > >> > > > >> > > > > client/application level, NOT the topic level. So > > > if > > > > > you > > > > > >> > have > > > > > >> > > a > > > > > >> > > > >> > service > > > > > >> > > > >> > > > > with a few dozen instances the quota is against > > all > > > > of > > > > > >> those > > > > > >> > > > >> > instances > > > > > >> > > > >> > > > > added up across all their topics. So actually the > > > > > effect > > > > > >> > would > > > > > >> > > > be > > > > > >> > > > >> the > > > > > >> > > > >> > > same > > > > > >> > > > >> > > > > either way but throttling gives the producer the > > > > > choice of > > > > > >> > > > either > > > > > >> > > > >> > > blocking > > > > > >> > > > >> > > > > or dropping. > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > -Jay > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > On Tue, Mar 17, 2015 at 10:08 AM, Steven Wu < > > > > > >> > > > stevenz...@gmail.com > > > > > >> > > > >> > > > > > > >> > > > >> > > wrote: > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > Jay, > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > let's say an app produces to 10 different > > topics. > > > > > one of > > > > > >> > the > > > > > >> > > > >> topic > > > > > >> > > > >> > is > > > > > >> > > > >> > > > > sent > > > > > >> > > > >> > > > > > from a library. due to whatever condition/bug, > > > this > > > > > lib > > > > > >> > > starts > > > > > >> > > > >> to > > > > > >> > > > >> > > send > > > > > >> > > > >> > > > > > messages over the quota. if we go with the > > > delayed > > > > > >> > response > > > > > >> > > > >> > > approach, it > > > > > >> > > > >> > > > > > will cause the whole shared RecordAccumulator > > > > buffer > > > > > to > > > > > >> be > > > > > >> > > > >> filled > > > > > >> > > > >> > up. > > > > > >> > > > >> > > > > that > > > > > >> > > > >> > > > > > will penalize other 9 topics who are within the > > > > > quota. > > > > > >> > that > > > > > >> > > is > > > > > >> > > > >> the > > > > > >> > > > >> > > > > > unfairness point that Ewen and I were trying to > > > > make. > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > if broker just drop the msg and return an > > > > > error/status > > > > > >> > code > > > > > >> > > > >> > > indicates the > > > > > >> > > > >> > > > > > drop and why. then producer can just move on > > and > > > > > accept > > > > > >> > the > > > > > >> > > > >> drop. > > > > > >> > > > >> > > shared > > > > > >> > > > >> > > > > > buffer won't be saturated and other 9 topics > > > won't > > > > be > > > > > >> > > > penalized. > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > Thanks, > > > > > >> > > > >> > > > > > Steven > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > On Tue, Mar 17, 2015 at 9:44 AM, Jay Kreps < > > > > > >> > > > jay.kr...@gmail.com > > > > > >> > > > >> > > > > > > >> > > > >> > > wrote: > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > Hey Steven, > > > > > >> > > > >> > > > > > > > > > > > >> > > > >> > > > > > > It is true that hitting the quota will cause > > > > > >> > back-pressure > > > > > >> > > > on > > > > > >> > > > >> the > > > > > >> > > > >> > > > > > producer. > > > > > >> > > > >> > > > > > > But the solution is simple, a producer that > > > wants > > > > > to > > > > > >> > avoid > > > > > >> > > > >> this > > > > > >> > > > >> > > should > > > > > >> > > > >> > > > > > stay > > > > > >> > > > >> > > > > > > under its quota. In other words this is a > > > > contract > > > > > >> > between > > > > > >> > > > the > > > > > >> > > > >> > > cluster > > > > > >> > > > >> > > > > > and > > > > > >> > > > >> > > > > > > the client, with each side having something > > to > > > > > uphold. > > > > > >> > > Quite > > > > > >> > > > >> > > possibly > > > > > >> > > > >> > > > > the > > > > > >> > > > >> > > > > > > same thing will happen in the absence of a > > > > quota, a > > > > > >> > client > > > > > >> > > > >> that > > > > > >> > > > >> > > > > produces > > > > > >> > > > >> > > > > > an > > > > > >> > > > >> > > > > > > unexpected amount of load will hit the limits > > > of > > > > > the > > > > > >> > > server > > > > > >> > > > >> and > > > > > >> > > > >> > > > > > experience > > > > > >> > > > >> > > > > > > backpressure. Quotas just allow you to set > > that > > > > > same > > > > > >> > limit > > > > > >> > > > at > > > > > >> > > > >> > > something > > > > > >> > > > >> > > > > > > lower than 100% of all resources on the > > server, > > > > > which > > > > > >> is > > > > > >> > > > >> useful > > > > > >> > > > >> > > for a > > > > > >> > > > >> > > > > > > shared cluster. > > > > > >> > > > >> > > > > > > > > > > > >> > > > >> > > > > > > -Jay > > > > > >> > > > >> > > > > > > > > > > > >> > > > >> > > > > > > On Mon, Mar 16, 2015 at 11:34 PM, Steven Wu < > > > > > >> > > > >> > stevenz...@gmail.com> > > > > > >> > > > >> > > > > > wrote: > > > > > >> > > > >> > > > > > > > > > > > >> > > > >> > > > > > > > wait. we create one kafka producer for each > > > > > cluster. > > > > > >> > > each > > > > > >> > > > >> > > cluster can > > > > > >> > > > >> > > > > > > have > > > > > >> > > > >> > > > > > > > many topics. if producer buffer got filled > > up > > > > > due to > > > > > >> > > > delayed > > > > > >> > > > >> > > response > > > > > >> > > > >> > > > > > for > > > > > >> > > > >> > > > > > > > one throttled topic, won't that penalize > > > other > > > > > >> topics > > > > > >> > > > >> unfairly? > > > > > >> > > > >> > > it > > > > > >> > > > >> > > > > > seems > > > > > >> > > > >> > > > > > > to > > > > > >> > > > >> > > > > > > > me that broker should just return error > > > without > > > > > >> delay. > > > > > >> > > > >> > > > > > > > > > > > > >> > > > >> > > > > > > > sorry that I am chatting to myself :) > > > > > >> > > > >> > > > > > > > > > > > > >> > > > >> > > > > > > > On Mon, Mar 16, 2015 at 11:29 PM, Steven > > Wu < > > > > > >> > > > >> > > stevenz...@gmail.com> > > > > > >> > > > >> > > > > > > wrote: > > > > > >> > > > >> > > > > > > > > > > > > >> > > > >> > > > > > > > > I think I can answer my own question. > > > delayed > > > > > >> > response > > > > > >> > > > >> will > > > > > >> > > > >> > > cause > > > > > >> > > > >> > > > > the > > > > > >> > > > >> > > > > > > > > producer buffer to be full, which then > > > result > > > > > in > > > > > >> > > either > > > > > >> > > > >> > thread > > > > > >> > > > >> > > > > > blocking > > > > > >> > > > >> > > > > > > > or > > > > > >> > > > >> > > > > > > > > message drop. > > > > > >> > > > >> > > > > > > > > > > > > > >> > > > >> > > > > > > > > On Mon, Mar 16, 2015 at 11:24 PM, Steven > > > Wu < > > > > > >> > > > >> > > stevenz...@gmail.com> > > > > > >> > > > >> > > > > > > > wrote: > > > > > >> > > > >> > > > > > > > > > > > > > >> > > > >> > > > > > > > >> please correct me if I am missing sth > > > here. > > > > I > > > > > am > > > > > >> > not > > > > > >> > > > >> > > understanding > > > > > >> > > > >> > > > > > how > > > > > >> > > > >> > > > > > > > >> would throttle work without > > > > > cooperation/back-off > > > > > >> > from > > > > > >> > > > >> > > producer. > > > > > >> > > > >> > > > > new > > > > > >> > > > >> > > > > > > Java > > > > > >> > > > >> > > > > > > > >> producer supports non-blocking API. why > > > > would > > > > > >> > delayed > > > > > >> > > > >> > > response be > > > > > >> > > > >> > > > > > able > > > > > >> > > > >> > > > > > > > to > > > > > >> > > > >> > > > > > > > >> slow down producer? producer will > > continue > > > > to > > > > > >> fire > > > > > >> > > > async > > > > > >> > > > >> > > sends. > > > > > >> > > > >> > > > > > > > >> > > > > > >> > > > >> > > > > > > > >> On Mon, Mar 16, 2015 at 10:58 PM, > > Guozhang > > > > > Wang < > > > > > >> > > > >> > > > > wangg...@gmail.com > > > > > >> > > > >> > > > > > > > > > > > >> > > > >> > > > > > > > >> wrote: > > > > > >> > > > >> > > > > > > > >> > > > > > >> > > > >> > > > > > > > >>> I think we are really discussing two > > > > separate > > > > > >> > issues > > > > > >> > > > >> here: > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> 1. Whether we should a) > > > > > >> > > > >> > > > > append-then-block-then-returnOKButThrottled > > > > > >> > > > >> > > > > > > or > > > > > >> > > > >> > > > > > > > b) > > > > > >> > > > >> > > > > > > > >>> block-then-returnFailDuetoThrottled for > > > > quota > > > > > >> > > actions > > > > > >> > > > on > > > > > >> > > > >> > > produce > > > > > >> > > > >> > > > > > > > >>> requests. > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> Both these approaches assume some kind > > of > > > > > >> > > > >> well-behaveness > > > > > >> > > > >> > of > > > > > >> > > > >> > > the > > > > > >> > > > >> > > > > > > > clients: > > > > > >> > > > >> > > > > > > > >>> option a) assumes the client sets an > > > proper > > > > > >> > timeout > > > > > >> > > > >> value > > > > > >> > > > >> > > while > > > > > >> > > > >> > > > > can > > > > > >> > > > >> > > > > > > > just > > > > > >> > > > >> > > > > > > > >>> ignore "OKButThrottled" response, while > > > > > option > > > > > >> b) > > > > > >> > > > >> assumes > > > > > >> > > > >> > the > > > > > >> > > > >> > > > > > client > > > > > >> > > > >> > > > > > > > >>> handles the "FailDuetoThrottled" > > > > > appropriately. > > > > > >> > For > > > > > >> > > > any > > > > > >> > > > >> > > malicious > > > > > >> > > > >> > > > > > > > clients > > > > > >> > > > >> > > > > > > > >>> that, for example, just keep retrying > > > > either > > > > > >> > > > >> intentionally > > > > > >> > > > >> > or > > > > > >> > > > >> > > > > not, > > > > > >> > > > >> > > > > > > > >>> neither > > > > > >> > > > >> > > > > > > > >>> of these approaches are actually > > > effective. > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> 2. For "OKButThrottled" and > > > > > "FailDuetoThrottled" > > > > > >> > > > >> responses, > > > > > >> > > > >> > > shall > > > > > >> > > > >> > > > > > we > > > > > >> > > > >> > > > > > > > >>> encode > > > > > >> > > > >> > > > > > > > >>> them as error codes or augment the > > > protocol > > > > > to > > > > > >> > use a > > > > > >> > > > >> > separate > > > > > >> > > > >> > > > > field > > > > > >> > > > >> > > > > > > > >>> indicating "status codes". > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> Today we have already incorporated some > > > > > status > > > > > >> > code > > > > > >> > > as > > > > > >> > > > >> > error > > > > > >> > > > >> > > > > codes > > > > > >> > > > >> > > > > > in > > > > > >> > > > >> > > > > > > > the > > > > > >> > > > >> > > > > > > > >>> responses, e.g. ReplicaNotAvailable in > > > > > >> > > > MetadataResponse, > > > > > >> > > > >> > the > > > > > >> > > > >> > > pros > > > > > >> > > > >> > > > > > of > > > > > >> > > > >> > > > > > > > this > > > > > >> > > > >> > > > > > > > >>> is of course using a single field for > > > > > response > > > > > >> > > status > > > > > >> > > > >> like > > > > > >> > > > >> > > the > > > > > >> > > > >> > > > > HTTP > > > > > >> > > > >> > > > > > > > >>> status > > > > > >> > > > >> > > > > > > > >>> codes, while the cons is that it > > requires > > > > > >> clients > > > > > >> > to > > > > > >> > > > >> handle > > > > > >> > > > >> > > the > > > > > >> > > > >> > > > > > error > > > > > >> > > > >> > > > > > > > >>> codes > > > > > >> > > > >> > > > > > > > >>> carefully. > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> I think maybe we can actually extend > > the > > > > > >> > single-code > > > > > >> > > > >> > > approach to > > > > > >> > > > >> > > > > > > > overcome > > > > > >> > > > >> > > > > > > > >>> its drawbacks, that is, wrap the error > > > > codes > > > > > >> > > semantics > > > > > >> > > > >> to > > > > > >> > > > >> > the > > > > > >> > > > >> > > > > users > > > > > >> > > > >> > > > > > > so > > > > > >> > > > >> > > > > > > > >>> that > > > > > >> > > > >> > > > > > > > >>> users do not need to handle the codes > > > > > >> one-by-one. > > > > > >> > > More > > > > > >> > > > >> > > > > concretely, > > > > > >> > > > >> > > > > > > > >>> following Jay's example the client > > could > > > > > write > > > > > >> > sth. > > > > > >> > > > like > > > > > >> > > > >> > > this: > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> ----------------- > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> if(error.isOK()) > > > > > >> > > > >> > > > > > > > >>> // status code is good or the code > > > can > > > > > be > > > > > >> > > simply > > > > > >> > > > >> > > ignored for > > > > > >> > > > >> > > > > > > this > > > > > >> > > > >> > > > > > > > >>> request type, process the request > > > > > >> > > > >> > > > > > > > >>> else if(error.needsRetry()) > > > > > >> > > > >> > > > > > > > >>> // throttled, transient error, > > etc: > > > > > retry > > > > > >> > > > >> > > > > > > > >>> else if(error.isFatal()) > > > > > >> > > > >> > > > > > > > >>> // non-retriable errors, etc: > > > notify / > > > > > >> > > terminate > > > > > >> > > > / > > > > > >> > > > >> > other > > > > > >> > > > >> > > > > > > handling > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> ----------------- > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> Only when the clients really want to > > > > handle, > > > > > for > > > > > >> > > > example > > > > > >> > > > >> > > > > > > > >>> FailDuetoThrottled > > > > > >> > > > >> > > > > > > > >>> status code specifically, it needs to: > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> if(error.isOK()) > > > > > >> > > > >> > > > > > > > >>> // status code is good or the code > > > can > > > > > be > > > > > >> > > simply > > > > > >> > > > >> > > ignored for > > > > > >> > > > >> > > > > > > this > > > > > >> > > > >> > > > > > > > >>> request type, process the request > > > > > >> > > > >> > > > > > > > >>> else if(error == FailDuetoThrottled ) > > > > > >> > > > >> > > > > > > > >>> // throttled: log it > > > > > >> > > > >> > > > > > > > >>> else if(error.needsRetry()) > > > > > >> > > > >> > > > > > > > >>> // transient error, etc: retry > > > > > >> > > > >> > > > > > > > >>> else if(error.isFatal()) > > > > > >> > > > >> > > > > > > > >>> // non-retriable errors, etc: > > > notify / > > > > > >> > > terminate > > > > > >> > > > / > > > > > >> > > > >> > other > > > > > >> > > > >> > > > > > > handling > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> ----------------- > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> And for implementation we can probably > > > > group > > > > > the > > > > > >> > > codes > > > > > >> > > > >> > > > > accordingly > > > > > >> > > > >> > > > > > > like > > > > > >> > > > >> > > > > > > > >>> HTTP status code such that we can do: > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> boolean Error.isOK() { > > > > > >> > > > >> > > > > > > > >>> return code < 300 && code >= 200; > > > > > >> > > > >> > > > > > > > >>> } > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> Guozhang > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> On Mon, Mar 16, 2015 at 10:24 PM, Ewen > > > > > >> > > > Cheslack-Postava > > > > > >> > > > >> < > > > > > >> > > > >> > > > > > > > >>> e...@confluent.io> > > > > > >> > > > >> > > > > > > > >>> wrote: > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> > Agreed that trying to shoehorn > > > non-error > > > > > codes > > > > > >> > > into > > > > > >> > > > >> the > > > > > >> > > > >> > > error > > > > > >> > > > >> > > > > > field > > > > > >> > > > >> > > > > > > > is > > > > > >> > > > >> > > > > > > > >>> a > > > > > >> > > > >> > > > > > > > >>> > bad idea. It makes it *way* too easy > > to > > > > > write > > > > > >> > code > > > > > >> > > > >> that > > > > > >> > > > >> > > looks > > > > > >> > > > >> > > > > > (and > > > > > >> > > > >> > > > > > > > >>> should > > > > > >> > > > >> > > > > > > > >>> > be) correct but is actually > > incorrect. > > > If > > > > > >> > > > necessary, I > > > > > >> > > > >> > > think > > > > > >> > > > >> > > > > it's > > > > > >> > > > >> > > > > > > > much > > > > > >> > > > >> > > > > > > > >>> > better to to spend a couple of extra > > > > bytes > > > > > to > > > > > >> > > encode > > > > > >> > > > >> that > > > > > >> > > > >> > > > > > > information > > > > > >> > > > >> > > > > > > > >>> > separately (a "status" or "warning" > > > > > section of > > > > > >> > the > > > > > >> > > > >> > > response). > > > > > >> > > > >> > > > > An > > > > > >> > > > >> > > > > > > > >>> indication > > > > > >> > > > >> > > > > > > > >>> > that throttling is occurring is > > > something > > > > > I'd > > > > > >> > > expect > > > > > >> > > > >> to > > > > > >> > > > >> > be > > > > > >> > > > >> > > > > > > indicated > > > > > >> > > > >> > > > > > > > >>> by a > > > > > >> > > > >> > > > > > > > >>> > bit flag in the response rather than > > as > > > > an > > > > > >> error > > > > > >> > > > code. > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > Gwen - I think an error code makes > > > sense > > > > > when > > > > > >> > the > > > > > >> > > > >> request > > > > > >> > > > >> > > > > > actually > > > > > >> > > > >> > > > > > > > >>> failed. > > > > > >> > > > >> > > > > > > > >>> > Option B, which Jun was advocating, > > > would > > > > > have > > > > > >> > > > >> appended > > > > > >> > > > >> > the > > > > > >> > > > >> > > > > > > messages > > > > > >> > > > >> > > > > > > > >>> > successfully. If the rate-limiting > > case > > > > > you're > > > > > >> > > > talking > > > > > >> > > > >> > > about > > > > > >> > > > >> > > > > had > > > > > >> > > > >> > > > > > > > >>> > successfully committed the messages, > > I > > > > > would > > > > > >> say > > > > > >> > > > >> that's > > > > > >> > > > >> > > also a > > > > > >> > > > >> > > > > > bad > > > > > >> > > > >> > > > > > > > use > > > > > >> > > > >> > > > > > > > >>> of > > > > > >> > > > >> > > > > > > > >>> > error codes. > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > On Mon, Mar 16, 2015 at 10:16 PM, > > Gwen > > > > > >> Shapira < > > > > > >> > > > >> > > > > > > > gshap...@cloudera.com> > > > > > >> > > > >> > > > > > > > >>> > wrote: > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > > We discussed an error code for > > > > > rate-limiting > > > > > >> > > > (which > > > > > >> > > > >> I > > > > > >> > > > >> > > think > > > > > >> > > > >> > > > > > made > > > > > >> > > > >> > > > > > > > >>> > > sense), isn't it a similar case? > > > > > >> > > > >> > > > > > > > >>> > > > > > > > >> > > > >> > > > > > > > >>> > > On Mon, Mar 16, 2015 at 10:10 PM, > > Jay > > > > > Kreps > > > > > >> < > > > > > >> > > > >> > > > > > jay.kr...@gmail.com > > > > > >> > > > >> > > > > > > > > > > > > >> > > > >> > > > > > > > >>> wrote: > > > > > >> > > > >> > > > > > > > >>> > > > My concern is that as soon as you > > > > start > > > > > >> > > encoding > > > > > >> > > > >> > > non-error > > > > > >> > > > >> > > > > > > > response > > > > > >> > > > >> > > > > > > > >>> > > > information into error codes the > > > next > > > > > >> > question > > > > > >> > > > is > > > > > >> > > > >> > what > > > > > >> > > > >> > > to > > > > > >> > > > >> > > > > do > > > > > >> > > > >> > > > > > if > > > > > >> > > > >> > > > > > > > two > > > > > >> > > > >> > > > > > > > >>> > such > > > > > >> > > > >> > > > > > > > >>> > > > codes apply (i.e. you have a > > > replica > > > > > down > > > > > >> > and > > > > > >> > > > the > > > > > >> > > > >> > > response > > > > > >> > > > >> > > > > is > > > > > >> > > > >> > > > > > > > >>> > quota'd). I > > > > > >> > > > >> > > > > > > > >>> > > > think I am trying to argue that > > > error > > > > > >> should > > > > > >> > > > mean > > > > > >> > > > >> > "why > > > > > >> > > > >> > > we > > > > > >> > > > >> > > > > > > failed > > > > > >> > > > >> > > > > > > > >>> your > > > > > >> > > > >> > > > > > > > >>> > > > request", for which there will > > > really > > > > > only > > > > > >> > be > > > > > >> > > > one > > > > > >> > > > >> > > reason, > > > > > >> > > > >> > > > > and > > > > > >> > > > >> > > > > > > any > > > > > >> > > > >> > > > > > > > >>> other > > > > > >> > > > >> > > > > > > > >>> > > > useful information we want to > > send > > > > > back is > > > > > >> > > just > > > > > >> > > > >> > another > > > > > >> > > > >> > > > > field > > > > > >> > > > >> > > > > > > in > > > > > >> > > > >> > > > > > > > >>> the > > > > > >> > > > >> > > > > > > > >>> > > > response. > > > > > >> > > > >> > > > > > > > >>> > > > > > > > > >> > > > >> > > > > > > > >>> > > > -Jay > > > > > >> > > > >> > > > > > > > >>> > > > > > > > > >> > > > >> > > > > > > > >>> > > > On Mon, Mar 16, 2015 at 9:51 PM, > > > Gwen > > > > > >> > Shapira > > > > > >> > > < > > > > > >> > > > >> > > > > > > > >>> gshap...@cloudera.com> > > > > > >> > > > >> > > > > > > > >>> > > wrote: > > > > > >> > > > >> > > > > > > > >>> > > > > > > > > >> > > > >> > > > > > > > >>> > > >> I think its not too late to > > > reserve > > > > a > > > > > set > > > > > >> > of > > > > > >> > > > >> error > > > > > >> > > > >> > > codes > > > > > >> > > > >> > > > > > > > >>> (200-299?) > > > > > >> > > > >> > > > > > > > >>> > > >> for "non-error" codes. > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > >> > > > >> > > > > > > > >>> > > >> It won't be backward compatible > > > > (i.e. > > > > > >> > clients > > > > > >> > > > >> that > > > > > >> > > > >> > > > > currently > > > > > >> > > > >> > > > > > > do > > > > > >> > > > >> > > > > > > > >>> "else > > > > > >> > > > >> > > > > > > > >>> > > >> throw" will throw on > > non-errors), > > > > but > > > > > >> > perhaps > > > > > >> > > > its > > > > > >> > > > >> > > > > > worthwhile. > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > >> > > > >> > > > > > > > >>> > > >> On Mon, Mar 16, 2015 at 9:42 PM, > > > Jay > > > > > >> Kreps > > > > > >> > < > > > > > >> > > > >> > > > > > > jay.kr...@gmail.com > > > > > >> > > > >> > > > > > > > > > > > > > >> > > > >> > > > > > > > >>> > wrote: > > > > > >> > > > >> > > > > > > > >>> > > >> > Hey Jun, > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > I'd really really really like > > to > > > > > avoid > > > > > >> > > that. > > > > > >> > > > >> > Having > > > > > >> > > > >> > > just > > > > > >> > > > >> > > > > > > > spent a > > > > > >> > > > >> > > > > > > > >>> > > bunch of > > > > > >> > > > >> > > > > > > > >>> > > >> > time on the clients, using the > > > > error > > > > > >> > codes > > > > > >> > > to > > > > > >> > > > >> > encode > > > > > >> > > > >> > > > > other > > > > > >> > > > >> > > > > > > > >>> > information > > > > > >> > > > >> > > > > > > > >>> > > >> > about the response is super > > > > > dangerous. > > > > > >> > The > > > > > >> > > > >> error > > > > > >> > > > >> > > > > handling > > > > > >> > > > >> > > > > > is > > > > > >> > > > >> > > > > > > > >>> one of > > > > > >> > > > >> > > > > > > > >>> > > the > > > > > >> > > > >> > > > > > > > >>> > > >> > hardest parts of the client > > > > > (Guozhang > > > > > >> > chime > > > > > >> > > > in > > > > > >> > > > >> > > here). > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > Generally the error handling > > > looks > > > > > like > > > > > >> > > > >> > > > > > > > >>> > > >> > if(error == none) > > > > > >> > > > >> > > > > > > > >>> > > >> > // good, process the > > > request > > > > > >> > > > >> > > > > > > > >>> > > >> > else if(error == > > > KNOWN_ERROR_1) > > > > > >> > > > >> > > > > > > > >>> > > >> > // handle known error 1 > > > > > >> > > > >> > > > > > > > >>> > > >> > else if(error == > > > KNOWN_ERROR_2) > > > > > >> > > > >> > > > > > > > >>> > > >> > // handle known error 2 > > > > > >> > > > >> > > > > > > > >>> > > >> > else > > > > > >> > > > >> > > > > > > > >>> > > >> > throw > > > > > >> > > Errors.forCode(error).exception(); > > > > > >> > > > >> // > > > > > >> > > > >> > or > > > > > >> > > > >> > > some > > > > > >> > > > >> > > > > > > other > > > > > >> > > > >> > > > > > > > >>> > default > > > > > >> > > > >> > > > > > > > >>> > > >> > behavior > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > This works because we have a > > > > > convention > > > > > >> > > that > > > > > >> > > > >> and > > > > > >> > > > >> > > error > > > > > >> > > > >> > > > > is > > > > > >> > > > >> > > > > > > > >>> something > > > > > >> > > > >> > > > > > > > >>> > > that > > > > > >> > > > >> > > > > > > > >>> > > >> > prevented your getting the > > > > response > > > > > so > > > > > >> > the > > > > > >> > > > >> default > > > > > >> > > > >> > > > > > handling > > > > > >> > > > >> > > > > > > > >>> case is > > > > > >> > > > >> > > > > > > > >>> > > sane > > > > > >> > > > >> > > > > > > > >>> > > >> > and forward compatible. It is > > > > > tempting > > > > > >> to > > > > > >> > > use > > > > > >> > > > >> the > > > > > >> > > > >> > > error > > > > > >> > > > >> > > > > > code > > > > > >> > > > >> > > > > > > > to > > > > > >> > > > >> > > > > > > > >>> > convey > > > > > >> > > > >> > > > > > > > >>> > > >> > information in the success > > case. > > > > For > > > > > >> > > example > > > > > >> > > > we > > > > > >> > > > >> > > could > > > > > >> > > > >> > > > > use > > > > > >> > > > >> > > > > > > > error > > > > > >> > > > >> > > > > > > > >>> > codes > > > > > >> > > > >> > > > > > > > >>> > > to > > > > > >> > > > >> > > > > > > > >>> > > >> > encode whether quotas were > > > > enforced, > > > > > >> > > whether > > > > > >> > > > >> the > > > > > >> > > > >> > > request > > > > > >> > > > >> > > > > > was > > > > > >> > > > >> > > > > > > > >>> served > > > > > >> > > > >> > > > > > > > >>> > > out > > > > > >> > > > >> > > > > > > > >>> > > >> of > > > > > >> > > > >> > > > > > > > >>> > > >> > cache, whether the stock > > market > > > is > > > > > up > > > > > >> > > today, > > > > > >> > > > or > > > > > >> > > > >> > > > > whatever. > > > > > >> > > > >> > > > > > > The > > > > > >> > > > >> > > > > > > > >>> > problem > > > > > >> > > > >> > > > > > > > >>> > > is > > > > > >> > > > >> > > > > > > > >>> > > >> > that since these are not > > errors > > > as > > > > > far > > > > > >> as > > > > > >> > > the > > > > > >> > > > >> > > client is > > > > > >> > > > >> > > > > > > > >>> concerned it > > > > > >> > > > >> > > > > > > > >>> > > >> should > > > > > >> > > > >> > > > > > > > >>> > > >> > not throw an exception but > > > process > > > > > the > > > > > >> > > > >> response, > > > > > >> > > > >> > > but now > > > > > >> > > > >> > > > > > we > > > > > >> > > > >> > > > > > > > >>> created > > > > > >> > > > >> > > > > > > > >>> > an > > > > > >> > > > >> > > > > > > > >>> > > >> > explicit requirement that that > > > > > error be > > > > > >> > > > handled > > > > > >> > > > >> > > > > explicitly > > > > > >> > > > >> > > > > > > > >>> since it > > > > > >> > > > >> > > > > > > > >>> > is > > > > > >> > > > >> > > > > > > > >>> > > >> > different. I really think that > > > > this > > > > > >> kind > > > > > >> > of > > > > > >> > > > >> > > information > > > > > >> > > > >> > > > > is > > > > > >> > > > >> > > > > > > not > > > > > >> > > > >> > > > > > > > >>> an > > > > > >> > > > >> > > > > > > > >>> > > error, > > > > > >> > > > >> > > > > > > > >>> > > >> it > > > > > >> > > > >> > > > > > > > >>> > > >> > is just information, and if we > > > > want > > > > > it > > > > > >> in > > > > > >> > > the > > > > > >> > > > >> > > response > > > > > >> > > > >> > > > > we > > > > > >> > > > >> > > > > > > > >>> should do > > > > > >> > > > >> > > > > > > > >>> > > the > > > > > >> > > > >> > > > > > > > >>> > > >> > right thing and add a new > > field > > > to > > > > > the > > > > > >> > > > >> response. > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > I think you saw the Samza bug > > > that > > > > > was > > > > > >> > > > >> literally > > > > > >> > > > >> > an > > > > > >> > > > >> > > > > > example > > > > > >> > > > >> > > > > > > of > > > > > >> > > > >> > > > > > > > >>> this > > > > > >> > > > >> > > > > > > > >>> > > >> > happening and leading to an > > > > infinite > > > > > >> > retry > > > > > >> > > > >> loop. > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > Further more I really want to > > > > > emphasize > > > > > >> > > that > > > > > >> > > > >> > hitting > > > > > >> > > > >> > > > > your > > > > > >> > > > >> > > > > > > > quota > > > > > >> > > > >> > > > > > > > >>> in > > > > > >> > > > >> > > > > > > > >>> > the > > > > > >> > > > >> > > > > > > > >>> > > >> > design that Adi has proposed > > is > > > > > >> actually > > > > > >> > > not > > > > > >> > > > an > > > > > >> > > > >> > > error > > > > > >> > > > >> > > > > > > > condition > > > > > >> > > > >> > > > > > > > >>> at > > > > > >> > > > >> > > > > > > > >>> > > all. > > > > > >> > > > >> > > > > > > > >>> > > >> It > > > > > >> > > > >> > > > > > > > >>> > > >> > is totally reasonable in any > > > > > bootstrap > > > > > >> > > > >> situation > > > > > >> > > > >> > to > > > > > >> > > > >> > > > > > > > >>> intentionally > > > > > >> > > > >> > > > > > > > >>> > > want to > > > > > >> > > > >> > > > > > > > >>> > > >> > run at the limit the system > > > > imposes > > > > > on > > > > > >> > you. > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > -Jay > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> > On Mon, Mar 16, 2015 at 4:27 > > PM, > > > > Jun > > > > > >> Rao > > > > > >> > < > > > > > >> > > > >> > > > > > j...@confluent.io> > > > > > >> > > > >> > > > > > > > >>> wrote: > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> It's probably useful for a > > > client > > > > > to > > > > > >> > know > > > > > >> > > > >> whether > > > > > >> > > > >> > > its > > > > > >> > > > >> > > > > > > > requests > > > > > >> > > > >> > > > > > > > >>> are > > > > > >> > > > >> > > > > > > > >>> > > >> >> throttled or not (e.g., for > > > > > monitoring > > > > > >> > and > > > > > >> > > > >> > > alerting). > > > > > >> > > > >> > > > > > From > > > > > >> > > > >> > > > > > > > that > > > > > >> > > > >> > > > > > > > >>> > > >> >> perspective, option B (delay > > > the > > > > > >> > requests > > > > > >> > > > and > > > > > >> > > > >> > > return an > > > > > >> > > > >> > > > > > > > error) > > > > > >> > > > >> > > > > > > > >>> > seems > > > > > >> > > > >> > > > > > > > >>> > > >> >> better. > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > >> > > > >> > > > > > > > >>> > > >> >> Thanks, > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > >> > > > >> > > > > > > > >>> > > >> >> Jun > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > >> > > > >> > > > > > > > >>> > > >> >> On Wed, Mar 4, 2015 at 3:51 > > PM, > > > > > Aditya > > > > > >> > > > >> Auradkar < > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > aaurad...@linkedin.com.invalid > > > > > > > > > >> wrote: > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Posted a KIP for quotas in > > > > kafka. > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Appreciate any feedback. > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Aditya > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > > > > > >> > > > >> > > > > > > > >>> > > >> > > > > > >> > > > >> > > > > > > > >>> > > > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > -- > > > > > >> > > > >> > > > > > > > >>> > Thanks, > > > > > >> > > > >> > > > > > > > >>> > Ewen > > > > > >> > > > >> > > > > > > > >>> > > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >>> -- > > > > > >> > > > >> > > > > > > > >>> -- Guozhang > > > > > >> > > > >> > > > > > > > >>> > > > > > >> > > > >> > > > > > > > >> > > > > > >> > > > >> > > > > > > > >> > > > > > >> > > > >> > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > -- > > > > > >> > > > > Sent from Gmail Mobile > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > -- > > > > > >> > > > Sent from Gmail Mobile > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > >