Hi David and Jun, I wanted to add to the discussion about using requests/sec vs. time on server threads (similar to request quota) for expressing quota for topic ops.
I think request quota does not protect the brokers from overload by itself -- it still requires tuning and sometimes re-tuning, because it depends on the workload behavior of all users (like a relative share of requests exempt from throttling). This makes it not that easy to set. Let me give you more details: 1. The amount of work that the user can get from the request quota depends on the load from other users. We measure and enforce user's clock time on threads---the time between 2 timestamps, one when the operation starts and one when the operation ends. If the user is the only load on the broker, it is less likely that their operation will be interrupted by the kernel to switch to another thread, and time away from the thread still counts. 1. Pros: this makes it more work-conserving, the user is less limited when there are more resources available. 2. Cons: Harder to capacity plan for the user, and could be confusing when the broker will suddenly stop supporting the load which it was supporting before. 2. For the above reason, it makes most sense to maximize the user's quota and set it as a percent of the maximum thread capacity (1100 with default broker config). 3. However, the actual maximum threads capacity is not really 1100: 1. Some of it will be taken by requests exempt from throttling, and the amount depends on the workload. We have seen cases (somewhat rare) where requests exempt from throttling take like ⅔ of the time on threads. 2. We have also seen cases of an overloaded cluster (full queues, timeouts, etc) due to high request rate while the time used on threads was way below the max (1100), like 600 or 700 (total exempt + non-exempt usage). Basically, when a broker is close to 100% CPU, it takes more and more time for the "unaccounted" work like thread getting a chance to pick up a request from the queue and get a timestamp. 4. As a result, there will be some tuning to decide on a safe value for total thread capacity, from where users can carve out their quotas. Some changes in users' workloads may require re-tuning, if, for example, it dramatically changes the relative share of non-exempt load. I think request quota works well for client request load in a sense that it ensures that different users get a fair/proportional share of resources during high broker load. If the user cannot get enough resources from their quota to support their request rate anymore, they can monitor their load and expand the cluster if needed (or rebalance). However, I think using time on threads for topic ops could be even more difficult than simple request rate (as proposed): 1. I understand that we don't only care about topic requests tying up the controller thread, but we also care that it does not create a large extra load on the cluster due to LeaderAndIsr and other related requests (this is more important for small clusters). 2. For that reason, tuning quota in terms of time on threads can be harder, because there is no easy way to say how this quota would translate to a number of operations (because that would depend on other broker load). Since tuning would be required anyway, I see the following workflow if we express controller quota in terms of partition mutations per second: 1. Run topic workload in isolation (the most expensive one, like create topic vs. add partitions) and see how much load it adds based on incoming rate. Choose quota depending on how much extra load your cluster can sustain in addition to its normal load. 2. Could be useful to publish some experimental results to give some ballpark numbers to make this sizing easier. I am interested to see if you agree with the listed assumptions here. I may have missed something, especially if there is an easier workflow for setting quota based on time on threads. Thanks, Anna On Thu, Apr 30, 2020 at 8:13 AM Tom Bentley <tbent...@redhat.com> wrote: > Hi David, > > Thanks for the KIP. > > If I understand the proposed throttling algorithm, an initial request would > be allowed (possibly making K negative) and only subsequent requests > (before K became positive) would receive the QUOTA_VIOLATED. That would > mean it was still possible to block the controller from handling other > events – you just need to do so via making one big request. > > While the reasons for rejecting execution throttling make sense given the > RPCs we have today that seems to be at the cost of still allowing harm to > the cluster, or did I misunderstand? > > Kind regards, > > Tom > > > > On Tue, Apr 28, 2020 at 1:49 AM Jun Rao <j...@confluent.io> wrote: > > > Hi, David, > > > > Thanks for the reply. A few more comments. > > > > 1. I am actually not sure if a quota based on request rate is easier for > > the users. For context, in KIP-124, we started with a request rate quota, > > but ended up not choosing it. The main issues are (a) requests are not > > equal; some are more expensive than others; (b) the users typically don't > > know how expensive each type of request is. For example, a big part of > > the controller cost is ZK write. To create a new topic with 1 partition, > > the number of ZK writes is 4 (1 for each segment > > in /brokers/topics/[topic]/partitions/[partitionId]/state). The cost of > > adding one partition to an existing topic requires 2 ZK writes. The cost > of > > deleting a topic with 1 partition requires 6 to 7 ZK writes. It's > unlikely > > for a user to know the exact cost associated with those > > implementation details. If users don't know the cost, it's not clear if > > they can set the rate properly. > > > > 2. I think that depends on the goal. To me, the common problem is that > you > > have many applications running on a shared Kafka cluster and one of the > > applications abuses the broker by issuing too many requests. In this > case, > > a global quota will end up throttling every application. However, what we > > really want in this case is to only throttle the application that causes > > the problem. A user level quota solves this problem more effectively. We > > may still need some sort of global quota when the total usage from all > > applications exceeds the broker resource. But that seems to be secondary > > since it's uncommon for all applications' usage to go up at the same > time. > > We can also solve this problem by reducing the per user quota for every > > application if there is a user level quota. > > > > 3. Not sure that I fully understand the difference in burst balance. The > > current throttling logic works as follows. Each quota is measured over a > > number of time windows. Suppose the Quota is to X/sec. If time passes and > > the quota is not being used, we are accumulating credit at the rate of > > X/sec. If a quota is being used, we are reducing that credit based on the > > usage. The credit expires when the corresponding window rolls out. The > max > > credit that can be accumulated is X * number of windows * window size. > So, > > in some sense, the current logic also supports burst and a way to cap the > > burst. Could you explain the difference with Token Bucket a bit more? > Also, > > the current quota system always executes the current request even if it's > > being throttled. It just informs the client to back off a throttled > amount > > of time before sending another request. > > > > Jun > > > > > > > > On Mon, Apr 27, 2020 at 5:15 AM David Jacot <dja...@confluent.io> wrote: > > > > > Hi Jun, > > > > > > Thank you for the feedback. > > > > > > 1. You are right. At the end, we do care about the percentage of time > > that > > > an operation ties up the controller thread. I thought about this but I > > was > > > not entirely convinced by it for following reasons: > > > > > > 1.1. While I do agree that setting up a rate and a burst is a bit > harder > > > than > > > allocating a percentage for the administrator of the cluster, I believe > > > that a > > > rate and a burst are way easier to understand for the users of the > > cluster. > > > > > > 1.2. Measuring the time that a request ties up the controller thread is > > not > > > as straightforward as it sounds because the controller reacts to ZK > > > TopicChange and TopicDeletion events in lieu of handling requests > > directly. > > > These events do not carry on the client id nor the user information so > > the > > > best would be to refactor the controller to accept requests instead of > > > reacting > > > to the events. This will be possible with KIP-590. It has obviously > other > > > side effects in the controller (e.g. batching). > > > > > > I leaned towards the current proposal mainly due to 1.1. as 1.2. can be > > (or > > > will be) fixed. Does 1.1. sound like a reasonable trade off to you? > > > > > > 2. It is not in the current proposal. I thought that a global quota > would > > > be > > > enough to start with. We can definitely make it work like the other > > quotas. > > > > > > 3. The main difference is that the Token Bucket algorithm defines an > > > explicit > > > burst B while guaranteeing an average rate R whereas our existing quota > > > guarantees an average rate R as well but starts to throttle as soon as > > the > > > rate goes above the defined quota. > > > > > > Creating and deleting topics is bursty by nature. Applications create > or > > > delete > > > topics occasionally by usually sending one request with multiple > topics. > > > The > > > reasoning behind allowing a burst is to allow such requests with a > > > reasonable > > > size to pass without being throttled whereas our current quota > mechanism > > > would reject any topics as soon as the rate is above the quota > requiring > > > the > > > applications to send subsequent requests to create or to delete all the > > > topics. > > > > > > Best, > > > David > > > > > > > > > On Fri, Apr 24, 2020 at 9:03 PM Jun Rao <j...@confluent.io> wrote: > > > > > > > Hi, David, > > > > > > > > Thanks for the KIP. A few quick comments. > > > > > > > > 1. About quota.partition.mutations.rate. I am not sure if it's very > > easy > > > > for the user to set the quota as a rate. For example, each partition > > > > mutation could take a different number of ZK operations (depending on > > > > things like retry). The time to process each ZK operation may also > vary > > > > from cluster to cluster. An alternative way to model this is to do > sth > > > > similar to the request (CPU) quota, which exposes quota as a > percentage > > > of > > > > the server threads that can be used. The current request quota > doesn't > > > > include the controller thread. We could add something that > > > measures/exposes > > > > the percentage of time that a request ties up the controller thread, > > > which > > > > seems to be what we really care about. > > > > > > > > 2. Is the new quota per user? Intuitively, we want to only penalize > > > > applications that overuse the broker resources, but not others. Also, > > in > > > > existing types of quotas (request, bandwidth), there is a hierarchy > > among > > > > clientId vs user and default vs customized (see > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users > > > > ). Does the new quota fit into the existing hierarchy? > > > > > > > > 3. It seems that you are proposing a new quota mechanism based on > Token > > > > Bucket algorithm. Could you describe its tradeoff with the existing > > quota > > > > mechanism? Ideally, it would be better if we have a single quota > > > mechanism > > > > within Kafka. > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > On Fri, Apr 24, 2020 at 9:52 AM David Jacot <dja...@confluent.io> > > wrote: > > > > > > > > > Hi folks, > > > > > > > > > > I'd like to start the discussion for KIP-599: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > > > > > > > It proposes to introduce quotas for the create topics, create > > > partitions > > > > > and delete topics operations. Let me know what you think, thanks. > > > > > > > > > > Best, > > > > > David > > > > > > > > > > > > > > >