[ https://issues.apache.org/jira/browse/KAFKA-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100840#comment-14100840 ]
Jay Kreps commented on KAFKA-656: --------------------------------- Hey [~ab10anand] a couple of suggestions: 1. You could imagine a fairly complete quota system would involve all kinds of granularities at which you could enforce the quota (at the IP level, at the user level, etc). There are also all kinds of things we can quota: requests, bytes in, bytes out, etc. However for now let's just keep it simple. Let's just start with a per-topic bytes-written quota. 2. The quota should be specified at the topic level but enforced at the partition level. I.e. if you specify 10MB/sec on a topic with 10 partitions then what we will enforce would be 1MB/sec per topic. 3. We should make use of the topic-level configs to implement this. I.e. add a new configuration in LogConfig that defaults to an infinite quota. 4. One piece of work that was done in anticipation of quotas was to combine the metrics and quota systems. This metrics package is in use on the clients now, but not yet on the server (it is under clients/src/main/org/apache/kafka/common/metrics I think). At a high-level the idea is to be able to enforce quotas on exactly the same things we monitor with metrics to make the reporting side of things easier. This code may actually do most of what the QuotaManager would have done, i.e. it will maintain all the metrics and each metric can have an optional quota associated, if the metric exceeds the quota it will throw an exception. Check this out and see if it makes sense in the way you were thinking of using it. > Add Quotas to Kafka > ------------------- > > Key: KAFKA-656 > URL: https://issues.apache.org/jira/browse/KAFKA-656 > Project: Kafka > Issue Type: New Feature > Components: core > Affects Versions: 0.8.1 > Reporter: Jay Kreps > Labels: project > > It would be nice to implement a quota system in Kafka to improve our support > for highly multi-tenant usage. The goal of this system would be to prevent > one naughty user from accidently overloading the whole cluster. > There are several quantities we would want to track: > 1. Requests pers second > 2. Bytes written per second > 3. Bytes read per second > There are two reasonable groupings we would want to aggregate and enforce > these thresholds at: > 1. Topic level > 2. Client level (e.g. by client id from the request) > When a request hits one of these limits we will simply reject it with a > QUOTA_EXCEEDED exception. > To avoid suddenly breaking things without warning, we should ideally support > two thresholds: a soft threshold at which we produce some kind of warning and > a hard threshold at which we give the error. The soft threshold could just be > defined as 80% (or whatever) of the hard threshold. > There are nuances to getting this right. If you measure second-by-second a > single burst may exceed the threshold, so we need a sustained measurement > over a period of time. > Likewise when do we stop giving this error? To make this work right we likely > need to charge against the quota for request *attempts* not just successful > requests. Otherwise a client that is overloading the server will just flap on > and off--i.e. we would disable them for a period of time but when we > re-enabled them they would likely still be abusing us. > It would be good to a wiki design on how this would all work as a starting > point for discussion. -- This message was sent by Atlassian JIRA (v6.2#6252)