[ https://issues.apache.org/jira/browse/KAFKA-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111339#comment-14111339 ]
Jay Kreps commented on KAFKA-656: --------------------------------- Another alternative which might make thing simpler would be to make the quota configuration per partition. This would avoid having to adapt it if the partition count changed. This is not ideal since I think people naturally think about data size at the topic level but at the moment all our size-based retention is done at the partition level so you could argue that this is more consistent. I think the other alternative which is a little bit more work is to store the per-topic quota in ZK and calculate quota/part-count to get the per partition limit. This per-partition limit would have to be updated when either the quota changed or the partition count changed. This might require a small bit of refactoring which I would be happy to walk you through if you end up going that route. > Add Quotas to Kafka > ------------------- > > Key: KAFKA-656 > URL: https://issues.apache.org/jira/browse/KAFKA-656 > Project: Kafka > Issue Type: New Feature > Components: core > Affects Versions: 0.8.1 > Reporter: Jay Kreps > Labels: project > > It would be nice to implement a quota system in Kafka to improve our support > for highly multi-tenant usage. The goal of this system would be to prevent > one naughty user from accidently overloading the whole cluster. > There are several quantities we would want to track: > 1. Requests pers second > 2. Bytes written per second > 3. Bytes read per second > There are two reasonable groupings we would want to aggregate and enforce > these thresholds at: > 1. Topic level > 2. Client level (e.g. by client id from the request) > When a request hits one of these limits we will simply reject it with a > QUOTA_EXCEEDED exception. > To avoid suddenly breaking things without warning, we should ideally support > two thresholds: a soft threshold at which we produce some kind of warning and > a hard threshold at which we give the error. The soft threshold could just be > defined as 80% (or whatever) of the hard threshold. > There are nuances to getting this right. If you measure second-by-second a > single burst may exceed the threshold, so we need a sustained measurement > over a period of time. > Likewise when do we stop giving this error? To make this work right we likely > need to charge against the quota for request *attempts* not just successful > requests. Otherwise a client that is overloading the server will just flap on > and off--i.e. we would disable them for a period of time but when we > re-enabled them they would likely still be abusing us. > It would be good to a wiki design on how this would all work as a starting > point for discussion. -- This message was sent by Atlassian JIRA (v6.2#6252)