Hi all, Just a quick update. We have made good progress regarding the implementation of this KIP. The major parts are already in trunk modulo the new "rate" implementation which is still under development.
I would like to change the type of the `controller_mutations_rate` from a Long to a Double. I have made various experiments and the rate can be quite low, especially in clusters with a lot of tenants. Using a Long is quite limiting here to fine tune small rates. I hope that this change is fine for everyone. Best, David On Mon, Jun 15, 2020 at 9:26 AM David Jacot <dja...@confluent.io> wrote: > Hi all, > > The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and > Colin) > and 2 non-binding votes (Tom, Anna). > > Thank you all for the fruitful discussion! I'd like to particularly thank > Anna who has > heavily contributed to the design of this KIP. > > Regards, > David > > On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe <cmcc...@apache.org> wrote: > >> +1. Thanks, David! >> >> best, >> Colin >> >> On Thu, Jun 11, 2020, at 23:51, David Jacot wrote: >> > Colin, Jun, >> > >> > Do the proposed error code and the updated KIP look good to you guys? >> I’d >> > like to wrap up and close the vote. >> > >> > Thanks, >> > David >> > >> > Le mer. 10 juin 2020 à 14:50, David Jacot <dja...@confluent.io> a >> écrit : >> > >> > > Hi Colin and Jun, >> > > >> > > I have no problem if we have to rewrite part of it when the new >> controller >> > > comes >> > > out. I will be more than happy to help out. >> > > >> > > Regarding KIP-590, I think that we can cope with a principal as a >> string >> > > when the >> > > time comes. The user entity name is defined with a string already. >> > > >> > > Regarding the name of the error, you have made a good point. I do >> agree >> > > that it >> > > is important to differentiate the two cases. I propose the following >> two >> > > errors: >> > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate >> as >> > > we have quotas which are not rate (e.g. request quota). This one is >> > > retryable >> > > once the throttle time is passed. >> > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has >> been >> > > reached and is a final error. >> > > We only need the former in this KIP. What do you think? >> > > >> > > Jun, I have added a few examples in the KIP. The new name works >> exactly >> > > like >> > > the existing one once it is added to the accepted dynamic configs for >> the >> > > user >> > > and the client entities. I have added a "Kafka Config Command" >> chapter in >> > > the >> > > KIP. I will also open a Jira to not forget updating the AK >> documentation >> > > once >> > > the KIP gets merged. >> > > >> > > Thanks, >> > > David >> > > >> > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao <j...@confluent.io> wrote: >> > > >> > >> Hi, Colin, >> > >> >> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this >> clear. >> > >> >> > >> Hi, David, >> > >> >> > >> We added a new quota name in the KIP. You chose not to bump up the >> version >> > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok >> since >> > >> the >> > >> quota name is represented as a string. However, the new quota name >> can be >> > >> used in client tools for setting and listing the quota ( >> > >> https://kafka.apache.org/documentation/#quotas). Could you document >> how >> > >> the >> > >> new name will be used in those tools? >> > >> >> > >> Thanks, >> > >> >> > >> Jun >> > >> >> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe <cmcc...@apache.org> >> wrote: >> > >> >> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: >> > >> > > Hi Colin, >> > >> > > >> > >> > > Thank you for your feedback. >> > >> > > >> > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would >> > >> like to >> > >> > > complement it with the following points: >> > >> > > >> > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the >> > >> topic >> > >> > > creations, partition creations and topics deletions that are >> exceeding >> > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will >> > >> > > be populated accordingly to let the client know how long it must >> wait. >> > >> > > >> > >> > > 2. I do agree that we actually want a mechanism to apply back >> pressure >> > >> > > to the clients. The KIP basically proposes a mechanism to >> control and >> > >> to >> > >> > > limit the rate of operations before entering the controller. I >> think >> > >> that >> > >> > > it is similar to your thinking but is enforced based on a defined >> > >> > > instead of relying on the number of pending items in the >> controller. >> > >> > > >> > >> > > 3. You mentioned an alternative idea in your comments that, if I >> > >> > understood >> > >> > > correctly, would bound the queue to limit the overload and reject >> > >> work if >> > >> > > the queue is full. I have been thinking about this as well but I >> don't >> > >> > think >> > >> > > that it works well in our case. >> > >> > > - The first reason is the one mentioned by Jun. We actually want >> to be >> > >> > able >> > >> > > to limit specific clients (the misbehaving ones) in a >> multi-tenant >> > >> > > environment. >> > >> > > - The second reason is that, at least in our current >> implementation, >> > >> the >> > >> > > length of the queue is not really a good characteristic to >> estimate >> > >> the >> > >> > load. >> > >> > > Coming back to your example of the CreateTopicsRequest. They >> create >> > >> path >> > >> > > in ZK for each newly created topics which trigger a ChangeTopic >> event >> > >> > in >> > >> > > the controller. That single event could be for a single topic in >> some >> > >> > cases or >> > >> > > for a thousand topics in others. >> > >> > > These two reasons aside, bounding the queue also introduces a >> knob >> > >> which >> > >> > > requires some tuning and thus suffers from all the points you >> > >> mentioned >> > >> > as >> > >> > > well, isn't it? The value may be true for one version but not for >> > >> > another. >> > >> > > >> > >> > > 4. Regarding the integration with KIP-500. The implementation of >> this >> > >> KIP >> > >> > > will span across the ApiLayer and the AdminManager. To be >> honest, we >> > >> > > can influence the implementation to work best with what you have >> in >> > >> mind >> > >> > > for the future controller. If you could shed some more light on >> the >> > >> > future >> > >> > > internal architecture, I can take this into account during the >> > >> > > implementation. >> > >> > > >> > >> > >> > >> > Hi David, >> > >> > >> > >> > Good question. The approach we are thinking of is that in the >> future, >> > >> > topic creation will be a controller RPC. In other words, rather >> than >> > >> > changing ZK and waiting for the controller code to notice, we'll go >> > >> through >> > >> > the controller code (which may change ZK, or may do something else >> in >> > >> the >> > >> > ZK-free environment). >> > >> > >> > >> > I don't think there is a good way to write this in the short term >> that >> > >> > avoids having to rewrite in the long term. That's probably OK >> though, >> > >> as >> > >> > long as we keep in mind that we'll need to. >> > >> > >> > >> > > >> > >> > > 5. Regarding KIP-590. For the create topics request, create >> partitions >> > >> > > request, and delete topics request, we are lucky enough to have >> > >> directed >> > >> > > all of them to the controller already so we are fine with doing >> the >> > >> > admission >> > >> > > in the broker which hosts the controller. If we were to throttle >> more >> > >> > operations >> > >> > > in the future, >> > >> > > I believe that we can continue to do it centrally while >> leveraging the >> > >> > > principal that is forwarded to account for the right tenant. The >> > >> reason >> > >> > > why I would like to keep it central is that we are talking about >> > >> sparse >> > >> > (or bursty) >> > >> > > workloads here so distributing the quota among all the brokers >> makes >> > >> > little sense. >> > >> > > >> > >> > >> > >> > Right. The main requirement here is that the quota must operate >> based >> > >> on >> > >> > principal names rather than KafkaPrincipal objects. We had a long >> > >> > discussion about KIP-590 and eventually concluded that we didn't >> want to >> > >> > make KafkaPrincipal serializable (at least not yet) so what would >> be >> > >> > forwarded is just a string, not the principal object. >> > >> > >> > >> > > >> > >> > > 6. Regarding the naming of the new error code. BUSY sounds too >> > >> generic to >> > >> > > me so I would rather prefer a specific error code. The main >> reason >> > >> being >> > >> > > that we may be able to reuse it in the future for other quotas. >> That >> > >> > being said, >> > >> > > we can find another name. QUOTA_EXHAUSTED? I don't feel too >> strongly >> > >> > about >> > >> > > the naming. I wonder what the others think about this. >> > >> > > >> > >> > >> > >> > Think about if you're someone writing software that uses the Kafka >> > >> > client. Let's say you try to create some topics and get back >> > >> > QUOTA_VIOLATED. What does it mean? Maybe it means that you tried >> to >> > >> > create too many topics in too short a time (violating the >> controller >> > >> > throttle). Or maybe it means that you exceeded your quota >> specifying >> > >> the >> > >> > number of partitions that they are allowed to create (let's assume >> that >> > >> > someone did a follow-on KIP for that that reuses this error code >> for >> > >> that.) >> > >> > >> > >> > But now you have a dilemma. If the controller was just busy, then >> > >> trying >> > >> > again is the right thing to do. If there is some other quota you >> > >> violated >> > >> > (number of partitions, number of topics, etc.) then retrying is >> wasteful >> > >> > and will not accomplish anything. Of course, the Kafka client >> software >> > >> > itself can tell the two cases apart since it has access to >> > >> throttleTimeMs. >> > >> > But the end user can't (and in certain modes, the exception is >> exposed >> > >> > directly to the user here). >> > >> > >> > >> > That's why "you violated some quota, not sure which" is not a good >> error >> > >> > code. So I think we should distinguish the two cases by having two >> > >> > separate error codes. Maybe something like RATE_QUOTA_VIOLATED and >> > >> > RESOURCE_QUOTA_VIOLATED. This KIP would only need the former one. >> > >> > >> > >> > Another possibility is that the user could distinguish the two >> cases by >> > >> > the error string. But typically we want error strings to be used >> to >> > >> give >> > >> > extra debugging information, not to make big decisions about what >> the >> > >> > client should do. So I think that the error code should actually >> be a >> > >> > little bit more specific, or at least tell the end user what to do >> with >> > >> it >> > >> > (that's why I suggested "busy"). >> > >> > >> > >> > > >> > >> > > Voilà. I hope that I have addressed all your questions/points >> and I am >> > >> > > sorry for the lengthy email. >> > >> > > >> > >> > >> > >> > Thanks, David. It looks good to me overall. And I thought your >> email >> > >> was >> > >> > very clear-- not too long at all. >> > >> > Let's just figure out the error code thing. >> > >> > >> > >> > best, >> > >> > Colin >> > >> > >> > >> > > >> > >> > > Regards, >> > >> > > David >> > >> > > >> > >> > > >> > >> > > On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe <cmcc...@apache.org> >> > >> wrote: >> > >> > > >> > >> > > > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: >> > >> > > > > Hi, Colin, >> > >> > > > > >> > >> > > > > Thanks for the comment. You brought up several points. >> > >> > > > > >> > >> > > > > 1. Should we set up a per user quota? To me, it does seem we >> need >> > >> > some >> > >> > > > sort >> > >> > > > > of a quota. When the controller runs out of resources, >> ideally, we >> > >> > only >> > >> > > > > want to penalize the bad behaving applications, instead of >> every >> > >> > > > > application. To do that, we will need to know what each >> > >> application >> > >> > is >> > >> > > > > entitled to and the per user quota is intended to capture >> that. >> > >> > > > > >> > >> > > > > 2. How easy is it to configure a quota? The following is how >> an >> > >> admin >> > >> > > > > typically sets up a quota in our existing systems. Pick a >> generous >> > >> > > > default >> > >> > > > > per user quota works for most applications. For the few >> resource >> > >> > > > intensive >> > >> > > > > applications, customize a higher quota for them. Reserve >> enough >> > >> > resources >> > >> > > > > in anticipation that a single (or a few) application will >> exceed >> > >> the >> > >> > > > quota >> > >> > > > > at a given time. >> > >> > > > > >> > >> > > > >> > >> > > > Hi Jun, >> > >> > > > >> > >> > > > Thanks for the response. >> > >> > > > >> > >> > > > Maybe I was too pessimistic about the ability of admins to >> > >> configure a >> > >> > > > useful quota here. I do agree that it would be nice to have >> the >> > >> > ability to >> > >> > > > set different quotas for different users, as you mentioned. >> > >> > > > >> > >> > > > > >> > >> > > > > 3. How should the quota be defined? In the discussion >> thread, we >> > >> > debated >> > >> > > > > between a usage based model vs a rate based model. Dave and >> Anna >> > >> > argued >> > >> > > > for >> > >> > > > > the rate based model mostly because it's simpler to >> implement. >> > >> > > > > >> > >> > > > >> > >> > > > I'm trying to think more about how this integrates with our >> plans >> > >> for >> > >> > > > KIP-500. When we get rid of ZK, we will have to handle this >> in the >> > >> > > > controller itself, rather than in the AdminManager. That >> implies >> > >> we'll >> > >> > > > have to rewrite the code. Maybe this is worth it if we want >> this >> > >> > feature >> > >> > > > now, though. >> > >> > > > >> > >> > > > Another wrinkle here is that as we discussed in KIP-590, >> controller >> > >> > > > operations will land on a random broker first, and only then be >> > >> > forwarded >> > >> > > > to the active controller. This implies that either admissions >> > >> control >> > >> > > > should happen on all brokers (needing some kind of distributed >> quota >> > >> > > > scheme), or be done on the controller after we've already done >> the >> > >> > work of >> > >> > > > forwarding the message. The second approach might not be that >> bad, >> > >> > but it >> > >> > > > would be nice to figure this out. >> > >> > > > >> > >> > > > > >> > >> > > > > 4. If a quota is exceeded, how is that enforced? My >> understanding >> > >> of >> > >> > the >> > >> > > > > KIP is that, if a quota is exceeded, the broker immediately >> sends >> > >> > back >> > >> > > > > a QUOTA_VIOLATED error and a throttle time back to the >> client, and >> > >> > the >> > >> > > > > client will wait for the throttle time before issuing the >> next >> > >> > request. >> > >> > > > > This seems to be the same as the BUSY error code you >> mentioned. >> > >> > > > > >> > >> > > > >> > >> > > > Yes, I agree, it sounds like we're thinking along the same >> lines. >> > >> > > > However, rather than QUOTA_VIOLATED, how about naming the >> error code >> > >> > BUSY? >> > >> > > > Then the error text could indicate the quota that we >> violated. This >> > >> > would >> > >> > > > be more generally useful as an error code and also avoid being >> > >> > confusingly >> > >> > > > similar to POLICY_VIOLATION. >> > >> > > > >> > >> > > > best, >> > >> > > > Colin >> > >> > > > >> > >> > > > > >> > >> > > > > I will let David chime in more on that. >> > >> > > > > >> > >> > > > > Thanks, >> > >> > > > > >> > >> > > > > Jun >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe < >> cmcc...@apache.org> >> > >> > wrote: >> > >> > > > > >> > >> > > > > > Hi David, >> > >> > > > > > >> > >> > > > > > Thanks for the KIP. >> > >> > > > > > >> > >> > > > > > I thought about this for a while and I actually think this >> > >> > approach is >> > >> > > > not >> > >> > > > > > quite right. The problem that I see here is that using an >> > >> > explicitly >> > >> > > > set >> > >> > > > > > quota here requires careful tuning by the cluster operator. >> > >> Even >> > >> > > > worse, >> > >> > > > > > this tuning might be invalidated by changes in overall >> > >> conditions >> > >> > or >> > >> > > > even >> > >> > > > > > more efficient controller software. >> > >> > > > > > >> > >> > > > > > For example, if we empirically find that the controller >> can do >> > >> 1000 >> > >> > > > topics >> > >> > > > > > in a minute (or whatever), this tuning might actually be >> wrong >> > >> if >> > >> > the >> > >> > > > next >> > >> > > > > > version of the software can do 2000 topics in a minute >> because >> > >> of >> > >> > > > > > efficiency upgrades. Or, the broker that the controller is >> > >> > located on >> > >> > > > > > might be experiencing heavy load from its non-controller >> > >> > operations, >> > >> > > > and so >> > >> > > > > > it can only do 500 topics in a minute during this period. >> > >> > > > > > >> > >> > > > > > So the system administrator gets a very obscure tunable >> (it's >> > >> not >> > >> > > > clear to >> > >> > > > > > a non-Kafka-developer what "controller mutations" are or >> why >> > >> they >> > >> > > > should >> > >> > > > > > care). And even worse, they will have to significantly >> > >> "sandbag" >> > >> > the >> > >> > > > value >> > >> > > > > > that they set it to, so that even under the heaviest load >> and >> > >> > oldest >> > >> > > > > > deployed version of the software, the controller can still >> > >> > function. >> > >> > > > Even >> > >> > > > > > worse, this new quota adds a lot of complexity to the >> > >> controller. >> > >> > > > > > >> > >> > > > > > What we really want is backpressure when the controller is >> > >> > > > overloaded. I >> > >> > > > > > believe this is the alternative you discuss in "Rejected >> > >> > Alternatives" >> > >> > > > > > under "Throttle the Execution instead of the Admission" >> Your >> > >> > reason >> > >> > > > for >> > >> > > > > > rejecting it is that the client error handling does not >> work >> > >> well >> > >> > in >> > >> > > > this >> > >> > > > > > case. But actually, this is an artifact of our current >> > >> > implementation, >> > >> > > > > > rather than a fundamental issue with backpressure. >> > >> > > > > > >> > >> > > > > > Consider the example of a CreateTopicsRequest. The >> controller >> > >> > could >> > >> > > > > > return a special error code if the load was too high, and >> take >> > >> the >> > >> > > > create >> > >> > > > > > topics event off the controller queue. Let's call that >> error >> > >> code >> > >> > > > BUSY. >> > >> > > > > > Additionally, the controller could immediately refuse new >> > >> events >> > >> > if >> > >> > > > the >> > >> > > > > > queue had reached its maximum length, and simply return >> BUSY for >> > >> > that >> > >> > > > case >> > >> > > > > > as well. >> > >> > > > > > >> > >> > > > > > Basically, the way we handle RPC timeouts in the controller >> > >> right >> > >> > now >> > >> > > > is >> > >> > > > > > not very good. As you know, we time out the RPC, so the >> client >> > >> > gets >> > >> > > > > > TimeoutException, but then keep the event on the queue, so >> that >> > >> it >> > >> > > > > > eventually gets executed! There's no reason why we have >> to do >> > >> > that. >> > >> > > > We >> > >> > > > > > could take the event off the queue if there is a timeout. >> This >> > >> > would >> > >> > > > > > reduce load and mostly avoid the paradoxical situations you >> > >> > describe >> > >> > > > > > (getting TopicExistsException for a CreateTopicsRequest >> retry, >> > >> > etc.) >> > >> > > > > > >> > >> > > > > > I say "mostly" because there are still cases where retries >> > >> could go >> > >> > > > astray >> > >> > > > > > (for example if we execute the topic creation but a network >> > >> problem >> > >> > > > > > prevents the response from being sent to the client). But >> this >> > >> > would >> > >> > > > still >> > >> > > > > > be a very big improvement over what we have now. >> > >> > > > > > >> > >> > > > > > Sorry for commenting so late on this but I got distracted >> by >> > >> some >> > >> > other >> > >> > > > > > things. I hope we can figure this one out-- I feel like >> there >> > >> is a >> > >> > > > chance >> > >> > > > > > to significantly simplify this. >> > >> > > > > > >> > >> > > > > > best, >> > >> > > > > > Colin >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > On Fri, May 29, 2020, at 07:57, David Jacot wrote: >> > >> > > > > > > Hi folks, >> > >> > > > > > > >> > >> > > > > > > I'd like to start the vote for KIP-599 which proposes a >> new >> > >> > quota to >> > >> > > > > > > throttle create topic, create partition, and delete >> topics >> > >> > > > operations to >> > >> > > > > > > protect the Kafka controller: >> > >> > > > > > > >> > >> > > > > > >> > >> > > > >> > >> > >> > >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations >> > >> > > > > > > >> > >> > > > > > > Please, let me know what you think. >> > >> > > > > > > >> > >> > > > > > > Cheers, >> > >> > > > > > > David >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > >> >