Hi, David, Another thing. Should we add controller.quota.window.size.seconds and controller.quota.window.num or just reuse the existing quota.window.size.seconds and quota.window.num that are used for other types of quotas?
Thanks, Jun On Tue, Jun 9, 2020 at 10:30 AM Jun Rao <j...@confluent.io> wrote: > Hi, David, > > Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 on > the KIP. > > Jun > > On Tue, Jun 9, 2020 at 5:07 AM David Jacot <dja...@confluent.io> wrote: > >> Hi Colin, >> >> Thank you for your feedback. >> >> Jun has summarized the situation pretty well. Thanks Jun! I would like to >> complement it with the following points: >> >> 1. Indeed, when the quota is exceeded, the broker will reject the topic >> creations, partition creations and topics deletions that are exceeding >> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will >> be populated accordingly to let the client know how long it must wait. >> >> 2. I do agree that we actually want a mechanism to apply back pressure >> to the clients. The KIP basically proposes a mechanism to control and to >> limit the rate of operations before entering the controller. I think that >> it is >> similar to your thinking but is enforced based on a defined quota instead >> of relying on the number of pending items in the controller. >> >> 3. You mentioned an alternative idea in your comments that, if I >> understood >> correctly, would bound the queue to limit the overload and reject work if >> the >> queue is full. I have been thinking about this as well but I don't think >> that it >> works well in our case. >> - The first reason is the one mentioned by Jun. We actually want to be >> able >> to limit specific clients (the misbehaving ones) in a multi-tenant >> environment. >> - The second reason is that, at least in our current implementation, the >> length of >> the queue is not really a good characteristic to estimate the load. >> Coming back >> to your example of the CreateTopicsRequest. They create path in ZK for >> each >> newly created topics which trigger a ChangeTopic event in the controller. >> That >> single event could be for a single topic in some cases or for a thousand >> topics >> in others. >> These two reasons aside, bounding the queue also introduces a knob which >> requires some tuning and thus suffers from all the points you mentioned as >> well, isn't it? The value may be true for one version but not for another. >> >> 4. Regarding the integration with KIP-500. The implementation of this KIP >> will span across the ApiLayer and the AdminManager. To be honest, we >> can influence the implementation to work best with what you have in mind >> for the future controller. If you could shed some more light on the future >> internal architecture, I can take this into account during the >> implementation. >> >> 5. Regarding KIP-590. For the create topics request, create partitions >> request, >> and delete topics request, we are lucky enough to have directed all of >> them >> to >> the controller already so we are fine with doing the admission in the >> broker >> which hosts the controller. If we were to throttle more operations in the >> future, >> I believe that we can continue to do it centrally while leveraging the >> principal >> that is forwarded to account for the right tenant. The reason why I would >> like >> to keep it central is that we are talking about sparse (or bursty) >> workloads here >> so distributing the quota among all the brokers makes little sense. >> >> 6. Regarding the naming of the new error code. BUSY sounds too generic to >> me so I would rather prefer a specific error code. The main reason being >> that >> we may be able to reuse it in the future for other quotas. That being >> said, >> we >> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about >> the naming. I wonder what the others think about this. >> >> VoilĂ . I hope that I have addressed all your questions/points and I am >> sorry for >> the lengthy email. >> >> Regards, >> David >> >> >> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe <cmcc...@apache.org> wrote: >> >> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: >> > > Hi, Colin, >> > > >> > > Thanks for the comment. You brought up several points. >> > > >> > > 1. Should we set up a per user quota? To me, it does seem we need some >> > sort >> > > of a quota. When the controller runs out of resources, ideally, we >> only >> > > want to penalize the bad behaving applications, instead of every >> > > application. To do that, we will need to know what each application is >> > > entitled to and the per user quota is intended to capture that. >> > > >> > > 2. How easy is it to configure a quota? The following is how an admin >> > > typically sets up a quota in our existing systems. Pick a generous >> > default >> > > per user quota works for most applications. For the few resource >> > intensive >> > > applications, customize a higher quota for them. Reserve enough >> resources >> > > in anticipation that a single (or a few) application will exceed the >> > quota >> > > at a given time. >> > > >> > >> > Hi Jun, >> > >> > Thanks for the response. >> > >> > Maybe I was too pessimistic about the ability of admins to configure a >> > useful quota here. I do agree that it would be nice to have the >> ability to >> > set different quotas for different users, as you mentioned. >> > >> > > >> > > 3. How should the quota be defined? In the discussion thread, we >> debated >> > > between a usage based model vs a rate based model. Dave and Anna >> argued >> > for >> > > the rate based model mostly because it's simpler to implement. >> > > >> > >> > I'm trying to think more about how this integrates with our plans for >> > KIP-500. When we get rid of ZK, we will have to handle this in the >> > controller itself, rather than in the AdminManager. That implies we'll >> > have to rewrite the code. Maybe this is worth it if we want this >> feature >> > now, though. >> > >> > Another wrinkle here is that as we discussed in KIP-590, controller >> > operations will land on a random broker first, and only then be >> forwarded >> > to the active controller. This implies that either admissions control >> > should happen on all brokers (needing some kind of distributed quota >> > scheme), or be done on the controller after we've already done the work >> of >> > forwarding the message. The second approach might not be that bad, but >> it >> > would be nice to figure this out. >> > >> > > >> > > 4. If a quota is exceeded, how is that enforced? My understanding of >> the >> > > KIP is that, if a quota is exceeded, the broker immediately sends back >> > > a QUOTA_VIOLATED error and a throttle time back to the client, and the >> > > client will wait for the throttle time before issuing the next >> request. >> > > This seems to be the same as the BUSY error code you mentioned. >> > > >> > >> > Yes, I agree, it sounds like we're thinking along the same lines. >> > However, rather than QUOTA_VIOLATED, how about naming the error code >> BUSY? >> > Then the error text could indicate the quota that we violated. This >> would >> > be more generally useful as an error code and also avoid being >> confusingly >> > similar to POLICY_VIOLATION. >> > >> > best, >> > Colin >> > >> > > >> > > I will let David chime in more on that. >> > > >> > > Thanks, >> > > >> > > Jun >> > > >> > > >> > > >> > > On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe <cmcc...@apache.org> >> wrote: >> > > >> > > > Hi David, >> > > > >> > > > Thanks for the KIP. >> > > > >> > > > I thought about this for a while and I actually think this approach >> is >> > not >> > > > quite right. The problem that I see here is that using an >> explicitly >> > set >> > > > quota here requires careful tuning by the cluster operator. Even >> > worse, >> > > > this tuning might be invalidated by changes in overall conditions or >> > even >> > > > more efficient controller software. >> > > > >> > > > For example, if we empirically find that the controller can do 1000 >> > topics >> > > > in a minute (or whatever), this tuning might actually be wrong if >> the >> > next >> > > > version of the software can do 2000 topics in a minute because of >> > > > efficiency upgrades. Or, the broker that the controller is located >> on >> > > > might be experiencing heavy load from its non-controller operations, >> > and so >> > > > it can only do 500 topics in a minute during this period. >> > > > >> > > > So the system administrator gets a very obscure tunable (it's not >> > clear to >> > > > a non-Kafka-developer what "controller mutations" are or why they >> > should >> > > > care). And even worse, they will have to significantly "sandbag" >> the >> > value >> > > > that they set it to, so that even under the heaviest load and oldest >> > > > deployed version of the software, the controller can still function. >> > Even >> > > > worse, this new quota adds a lot of complexity to the controller. >> > > > >> > > > What we really want is backpressure when the controller is >> > overloaded. I >> > > > believe this is the alternative you discuss in "Rejected >> Alternatives" >> > > > under "Throttle the Execution instead of the Admission" Your reason >> > for >> > > > rejecting it is that the client error handling does not work well in >> > this >> > > > case. But actually, this is an artifact of our current >> implementation, >> > > > rather than a fundamental issue with backpressure. >> > > > >> > > > Consider the example of a CreateTopicsRequest. The controller could >> > > > return a special error code if the load was too high, and take the >> > create >> > > > topics event off the controller queue. Let's call that error code >> > BUSY. >> > > > Additionally, the controller could immediately refuse new events if >> > the >> > > > queue had reached its maximum length, and simply return BUSY for >> that >> > case >> > > > as well. >> > > > >> > > > Basically, the way we handle RPC timeouts in the controller right >> now >> > is >> > > > not very good. As you know, we time out the RPC, so the client gets >> > > > TimeoutException, but then keep the event on the queue, so that it >> > > > eventually gets executed! There's no reason why we have to do that. >> > We >> > > > could take the event off the queue if there is a timeout. This >> would >> > > > reduce load and mostly avoid the paradoxical situations you describe >> > > > (getting TopicExistsException for a CreateTopicsRequest retry, etc.) >> > > > >> > > > I say "mostly" because there are still cases where retries could go >> > astray >> > > > (for example if we execute the topic creation but a network problem >> > > > prevents the response from being sent to the client). But this >> would >> > still >> > > > be a very big improvement over what we have now. >> > > > >> > > > Sorry for commenting so late on this but I got distracted by some >> other >> > > > things. I hope we can figure this one out-- I feel like there is a >> > chance >> > > > to significantly simplify this. >> > > > >> > > > best, >> > > > Colin >> > > > >> > > > >> > > > On Fri, May 29, 2020, at 07:57, David Jacot wrote: >> > > > > Hi folks, >> > > > > >> > > > > I'd like to start the vote for KIP-599 which proposes a new quota >> to >> > > > > throttle create topic, create partition, and delete topics >> > operations to >> > > > > protect the Kafka controller: >> > > > > >> > > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations >> > > > > >> > > > > Please, let me know what you think. >> > > > > >> > > > > Cheers, >> > > > > David >> > > > > >> > > > >> > > >> > >> >