Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

David Jacot Tue, 28 Jul 2020 04:49:09 -0700

Hi all,

Just a quick update. We have made good progress regarding the implementation
of this KIP. The major parts are already in trunk modulo the new "rate"
implementation
which is still under development.


I would like to change the type of the `controller_mutations_rate` from a
Long to
a Double. I have made various experiments and the rate can be quite low,
especially
in clusters with a lot of tenants. Using a Long is quite limiting here to
fine tune small
rates. I hope that this change is fine for everyone.

Best,
David

On Mon, Jun 15, 2020 at 9:26 AM David Jacot <dja...@confluent.io> wrote:

> Hi all,
>
> The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and
> Colin)
> and 2 non-binding votes (Tom, Anna).
>
> Thank you all for the fruitful discussion! I'd like to particularly thank
> Anna who has
> heavily contributed to the design of this KIP.
>
> Regards,
> David
>
> On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe <cmcc...@apache.org> wrote:
>
>> +1.  Thanks, David!
>>
>> best,
>> Colin
>>
>> On Thu, Jun 11, 2020, at 23:51, David Jacot wrote:
>> > Colin, Jun,
>> >
>> > Do the proposed error code and the updated KIP look good to you guys?
>> I’d
>> > like to wrap up and close the vote.
>> >
>> > Thanks,
>> > David
>> >
>> > Le mer. 10 juin 2020 à 14:50, David Jacot <dja...@confluent.io> a
>> écrit :
>> >
>> > > Hi Colin and Jun,
>> > >
>> > > I have no problem if we have to rewrite part of it when the new
>> controller
>> > > comes
>> > > out. I will be more than happy to help out.
>> > >
>> > > Regarding KIP-590, I think that we can cope with a principal as a
>> string
>> > > when the
>> > > time comes. The user entity name is defined with a string already.
>> > >
>> > > Regarding the name of the error, you have made a good point. I do
>> agree
>> > > that it
>> > > is important to differentiate the two cases. I propose the following
>> two
>> > > errors:
>> > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate
>> as
>> > > we have quotas which are not rate (e.g. request quota). This one is
>> > > retryable
>> > > once the throttle time is passed.
>> > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has
>> been
>> > > reached and is a final error.
>> > > We only need the former in this KIP. What do you think?
>> > >
>> > > Jun, I have added a few examples in the KIP. The new name works
>> exactly
>> > > like
>> > > the existing one once it is added to the accepted dynamic configs for
>> the
>> > > user
>> > > and the client entities. I have added a "Kafka Config Command"
>> chapter in
>> > > the
>> > > KIP. I will also open a Jira to not forget updating the AK
>> documentation
>> > > once
>> > > the KIP gets merged.
>> > >
>> > > Thanks,
>> > > David
>> > >
>> > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao <j...@confluent.io> wrote:
>> > >
>> > >> Hi, Colin,
>> > >>
>> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this
>> clear.
>> > >>
>> > >> Hi, David,
>> > >>
>> > >> We added a new quota name in the KIP. You chose not to bump up the
>> version
>> > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok
>> since
>> > >> the
>> > >> quota name is represented as a string. However, the new quota name
>> can be
>> > >> used in client tools for setting and listing the quota (
>> > >> https://kafka.apache.org/documentation/#quotas). Could you document
>> how
>> > >> the
>> > >> new name will be used in those tools?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Jun
>> > >>
>> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe <cmcc...@apache.org>
>> wrote:
>> > >>
>> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
>> > >> > > Hi Colin,
>> > >> > >
>> > >> > > Thank you for your feedback.
>> > >> > >
>> > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would
>> > >> like to
>> > >> > > complement it with the following points:
>> > >> > >
>> > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the
>> > >> topic
>> > >> > > creations, partition creations and topics deletions that are
>> exceeding
>> > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
>> > >> > > be populated accordingly to let the client know how long it must
>> wait.
>> > >> > >
>> > >> > > 2. I do agree that we actually want a mechanism to apply back
>> pressure
>> > >> > > to the clients. The KIP basically proposes a mechanism to
>> control and
>> > >> to
>> > >> > > limit the rate of operations before entering the controller. I
>> think
>> > >> that
>> > >> > > it is similar to your thinking but is enforced based on a defined
>> > >> > > instead of relying on the number of pending items in the
>> controller.
>> > >> > >
>> > >> > > 3. You mentioned an alternative idea in your comments that, if I
>> > >> > understood
>> > >> > > correctly, would bound the queue to limit the overload and reject
>> > >> work if
>> > >> > > the queue is full. I have been thinking about this as well but I
>> don't
>> > >> > think
>> > >> > > that it  works well in our case.
>> > >> > > - The first reason is the one mentioned by Jun. We actually want
>> to be
>> > >> > able
>> > >> > > to limit specific clients (the misbehaving ones) in a
>> multi-tenant
>> > >> > > environment.
>> > >> > > - The second reason is that, at least in our current
>> implementation,
>> > >> the
>> > >> > > length of the queue is not really a good characteristic to
>> estimate
>> > >> the
>> > >> > load.
>> > >> > > Coming back to your example of the CreateTopicsRequest. They
>> create
>> > >> path
>> > >> > >  in ZK for each newly created topics which trigger a ChangeTopic
>> event
>> > >> > in
>> > >> > > the controller. That single event could be for a single topic in
>> some
>> > >> > cases or
>> > >> > > for a thousand topics in others.
>> > >> > > These two reasons aside, bounding the queue also introduces a
>> knob
>> > >> which
>> > >> > > requires some tuning and thus suffers from all the points you
>> > >> mentioned
>> > >> > as
>> > >> > > well, isn't it? The value may be true for one version but not for
>> > >> > another.
>> > >> > >
>> > >> > > 4. Regarding the integration with KIP-500. The implementation of
>> this
>> > >> KIP
>> > >> > > will span across the ApiLayer and the AdminManager. To be
>> honest, we
>> > >> > > can influence the implementation to work best with what you have
>> in
>> > >> mind
>> > >> > > for the future controller. If you could shed some more light on
>> the
>> > >> > future
>> > >> > > internal architecture, I can take this into account during the
>> > >> > > implementation.
>> > >> > >
>> > >> >
>> > >> > Hi David,
>> > >> >
>> > >> > Good question.  The approach we are thinking of is that in the
>> future,
>> > >> > topic creation will be a controller RPC.  In other words, rather
>> than
>> > >> > changing ZK and waiting for the controller code to notice, we'll go
>> > >> through
>> > >> > the controller code (which may change ZK, or may do something else
>> in
>> > >> the
>> > >> > ZK-free environment).
>> > >> >
>> > >> > I don't think there is a good way to write this in the short term
>> that
>> > >> > avoids having to rewrite in the long term.  That's probably OK
>> though,
>> > >> as
>> > >> > long as we keep in mind that we'll need to.
>> > >> >
>> > >> > >
>> > >> > > 5. Regarding KIP-590. For the create topics request, create
>> partitions
>> > >> > > request, and delete topics request, we are lucky enough to have
>> > >> directed
>> > >> > > all of them to the controller already so we are fine with doing
>> the
>> > >> > admission
>> > >> > > in the broker which hosts the controller. If we were to throttle
>> more
>> > >> > operations
>> > >> > > in the future,
>> > >> > > I believe that we can continue to do it centrally while
>> leveraging the
>> > >> > > principal that is forwarded to account for the right tenant. The
>> > >> reason
>> > >> > > why I would like to keep it central is that we are talking about
>> > >> sparse
>> > >> > (or bursty)
>> > >> > > workloads here so distributing the quota among all the brokers
>> makes
>> > >> > little sense.
>> > >> > >
>> > >> >
>> > >> > Right.  The main requirement here is that the quota must operate
>> based
>> > >> on
>> > >> > principal names rather than KafkaPrincipal objects.  We had a long
>> > >> > discussion about KIP-590 and eventually concluded that we didn't
>> want to
>> > >> > make KafkaPrincipal serializable (at least not yet) so what would
>> be
>> > >> > forwarded is just a string, not the principal object.
>> > >> >
>> > >> > >
>> > >> > > 6. Regarding the naming of the new error code. BUSY sounds too
>> > >> generic to
>> > >> > > me so I would rather prefer a specific error code. The main
>> reason
>> > >> being
>> > >> > > that we may be able to reuse it in the future for other quotas.
>> That
>> > >> > being said,
>> > >> > > we can find another name. QUOTA_EXHAUSTED? I don't feel too
>> strongly
>> > >> > about
>> > >> > > the naming. I wonder what the others think about this.
>> > >> > >
>> > >> >
>> > >> > Think about if you're someone writing software that uses the Kafka
>> > >> > client.  Let's say you try to create some topics and get back
>> > >> > QUOTA_VIOLATED.  What does it mean?  Maybe it means that you tried
>> to
>> > >> > create too many topics in too short a time (violating the
>> controller
>> > >> > throttle).  Or maybe it means that you exceeded your quota
>> specifying
>> > >> the
>> > >> > number of partitions that they are allowed to create (let's assume
>> that
>> > >> > someone did a follow-on KIP for that that reuses this error code
>> for
>> > >> that.)
>> > >> >
>> > >> > But now you have a dilemma.  If the controller was just busy, then
>> > >> trying
>> > >> > again is the right thing to do.  If there is some other quota you
>> > >> violated
>> > >> > (number of partitions, number of topics, etc.) then retrying is
>> wasteful
>> > >> > and will not accomplish anything.  Of course, the Kafka client
>> software
>> > >> > itself can tell the two cases apart since it has access to
>> > >> throttleTimeMs.
>> > >> > But the end user can't (and in certain modes, the exception is
>> exposed
>> > >> > directly to the user here).
>> > >> >
>> > >> > That's why "you violated some quota, not sure which" is not a good
>> error
>> > >> > code.  So I think we should distinguish the two cases by having two
>> > >> > separate error codes.  Maybe something like RATE_QUOTA_VIOLATED and
>> > >> > RESOURCE_QUOTA_VIOLATED.  This KIP would only need the former one.
>> > >> >
>> > >> > Another possibility is that the user could distinguish the two
>> cases by
>> > >> > the error string.  But typically we want error strings to be used
>> to
>> > >> give
>> > >> > extra debugging information, not to make big decisions about what
>> the
>> > >> > client should do.  So I think that the error code should actually
>> be a
>> > >> > little bit more specific, or at least tell the end user what to do
>> with
>> > >> it
>> > >> > (that's why I suggested "busy").
>> > >> >
>> > >> > >
>> > >> > > Voilà. I hope that I have addressed all your questions/points
>> and I am
>> > >> > > sorry for the lengthy email.
>> > >> > >
>> > >> >
>> > >> > Thanks, David.  It looks good to me overall.  And I thought your
>> email
>> > >> was
>> > >> > very clear-- not too long at all.
>> > >> >  Let's just figure out the error code thing.
>> > >> >
>> > >> > best,
>> > >> > Colin
>> > >> >
>> > >> > >
>> > >> > > Regards,
>> > >> > > David
>> > >> > >
>> > >> > >
>> > >> > > On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe <cmcc...@apache.org>
>> > >> wrote:
>> > >> > >
>> > >> > > > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
>> > >> > > > > Hi, Colin,
>> > >> > > > >
>> > >> > > > > Thanks for the comment. You brought up several points.
>> > >> > > > >
>> > >> > > > > 1. Should we set up a per user quota? To me, it does seem we
>> need
>> > >> > some
>> > >> > > > sort
>> > >> > > > > of a quota. When the controller runs out of resources,
>> ideally, we
>> > >> > only
>> > >> > > > > want to penalize the bad behaving applications, instead of
>> every
>> > >> > > > > application. To do that, we will need to know what each
>> > >> application
>> > >> > is
>> > >> > > > > entitled to and the per user quota is intended to capture
>> that.
>> > >> > > > >
>> > >> > > > > 2. How easy is it to configure a quota? The following is how
>> an
>> > >> admin
>> > >> > > > > typically sets up a quota in our existing systems. Pick a
>> generous
>> > >> > > > default
>> > >> > > > > per user quota works for most applications. For the few
>> resource
>> > >> > > > intensive
>> > >> > > > > applications, customize a higher quota for them. Reserve
>> enough
>> > >> > resources
>> > >> > > > > in anticipation that a single (or a few) application will
>> exceed
>> > >> the
>> > >> > > > quota
>> > >> > > > > at a given time.
>> > >> > > > >
>> > >> > > >
>> > >> > > > Hi Jun,
>> > >> > > >
>> > >> > > > Thanks for the response.
>> > >> > > >
>> > >> > > > Maybe I was too pessimistic about the ability of admins to
>> > >> configure a
>> > >> > > > useful quota here.  I do agree that it would be nice to have
>> the
>> > >> > ability to
>> > >> > > > set different quotas for different users, as you mentioned.
>> > >> > > >
>> > >> > > > >
>> > >> > > > > 3. How should the quota be defined? In the discussion
>> thread, we
>> > >> > debated
>> > >> > > > > between a usage based model vs a rate based model. Dave and
>> Anna
>> > >> > argued
>> > >> > > > for
>> > >> > > > > the rate based model mostly because it's simpler to
>> implement.
>> > >> > > > >
>> > >> > > >
>> > >> > > > I'm trying to think more about how this integrates with our
>> plans
>> > >> for
>> > >> > > > KIP-500.  When we get rid of ZK, we will have to handle this
>> in the
>> > >> > > > controller itself, rather than in the AdminManager.  That
>> implies
>> > >> we'll
>> > >> > > > have to rewrite the code.  Maybe this is worth it if we want
>> this
>> > >> > feature
>> > >> > > > now, though.
>> > >> > > >
>> > >> > > > Another wrinkle here is that as we discussed in KIP-590,
>> controller
>> > >> > > > operations will land on a random broker first, and only then be
>> > >> > forwarded
>> > >> > > > to the active controller.  This implies that either admissions
>> > >> control
>> > >> > > > should happen on all brokers (needing some kind of distributed
>> quota
>> > >> > > > scheme), or be done on the controller after we've already done
>> the
>> > >> > work of
>> > >> > > > forwarding the message.  The second approach might not be that
>> bad,
>> > >> > but it
>> > >> > > > would be nice to figure this out.
>> > >> > > >
>> > >> > > > >
>> > >> > > > > 4. If a quota is exceeded, how is that enforced? My
>> understanding
>> > >> of
>> > >> > the
>> > >> > > > > KIP is that, if a quota is exceeded, the broker immediately
>> sends
>> > >> > back
>> > >> > > > > a QUOTA_VIOLATED error and a throttle time back to the
>> client, and
>> > >> > the
>> > >> > > > > client will wait for the throttle time before issuing the
>> next
>> > >> > request.
>> > >> > > > > This seems to be the same as the BUSY error code you
>> mentioned.
>> > >> > > > >
>> > >> > > >
>> > >> > > > Yes, I agree, it sounds like we're thinking along the same
>> lines.
>> > >> > > > However, rather than QUOTA_VIOLATED, how about naming the
>> error code
>> > >> > BUSY?
>> > >> > > > Then the error text could indicate the quota that we
>> violated.  This
>> > >> > would
>> > >> > > > be more generally useful as an error code and also avoid being
>> > >> > confusingly
>> > >> > > > similar to POLICY_VIOLATION.
>> > >> > > >
>> > >> > > > best,
>> > >> > > > Colin
>> > >> > > >
>> > >> > > > >
>> > >> > > > > I will let David chime in more on that.
>> > >> > > > >
>> > >> > > > > Thanks,
>> > >> > > > >
>> > >> > > > > Jun
>> > >> > > > >
>> > >> > > > >
>> > >> > > > >
>> > >> > > > > On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe <
>> cmcc...@apache.org>
>> > >> > wrote:
>> > >> > > > >
>> > >> > > > > > Hi David,
>> > >> > > > > >
>> > >> > > > > > Thanks for the KIP.
>> > >> > > > > >
>> > >> > > > > > I thought about this for a while and I actually think this
>> > >> > approach is
>> > >> > > > not
>> > >> > > > > > quite right.  The problem that I see here is that using an
>> > >> > explicitly
>> > >> > > > set
>> > >> > > > > > quota here requires careful tuning by the cluster operator.
>> > >> Even
>> > >> > > > worse,
>> > >> > > > > > this tuning might be invalidated by changes in overall
>> > >> conditions
>> > >> > or
>> > >> > > > even
>> > >> > > > > > more efficient controller software.
>> > >> > > > > >
>> > >> > > > > > For example, if we empirically find that the controller
>> can do
>> > >> 1000
>> > >> > > > topics
>> > >> > > > > > in a minute (or whatever), this tuning might actually be
>> wrong
>> > >> if
>> > >> > the
>> > >> > > > next
>> > >> > > > > > version of the software can do 2000 topics in a minute
>> because
>> > >> of
>> > >> > > > > > efficiency upgrades.  Or, the broker that the controller is
>> > >> > located on
>> > >> > > > > > might be experiencing heavy load from its non-controller
>> > >> > operations,
>> > >> > > > and so
>> > >> > > > > > it can only do 500 topics in a minute during this period.
>> > >> > > > > >
>> > >> > > > > > So the system administrator gets a very obscure tunable
>> (it's
>> > >> not
>> > >> > > > clear to
>> > >> > > > > > a non-Kafka-developer what "controller mutations" are or
>> why
>> > >> they
>> > >> > > > should
>> > >> > > > > > care).  And even worse, they will have to significantly
>> > >> "sandbag"
>> > >> > the
>> > >> > > > value
>> > >> > > > > > that they set it to, so that even under the heaviest load
>> and
>> > >> > oldest
>> > >> > > > > > deployed version of the software, the controller can still
>> > >> > function.
>> > >> > > > Even
>> > >> > > > > > worse, this new quota adds a lot of complexity to the
>> > >> controller.
>> > >> > > > > >
>> > >> > > > > > What we really want is backpressure when the controller is
>> > >> > > > overloaded.  I
>> > >> > > > > > believe this is the alternative you discuss in "Rejected
>> > >> > Alternatives"
>> > >> > > > > > under "Throttle the Execution instead of the Admission"
>> Your
>> > >> > reason
>> > >> > > > for
>> > >> > > > > > rejecting it is that the client error handling does not
>> work
>> > >> well
>> > >> > in
>> > >> > > > this
>> > >> > > > > > case.  But actually, this is an artifact of our current
>> > >> > implementation,
>> > >> > > > > > rather than a fundamental issue with backpressure.
>> > >> > > > > >
>> > >> > > > > > Consider the example of a CreateTopicsRequest.  The
>> controller
>> > >> > could
>> > >> > > > > > return a special error code if the load was too high, and
>> take
>> > >> the
>> > >> > > > create
>> > >> > > > > > topics event off the controller queue.  Let's call that
>> error
>> > >> code
>> > >> > > > BUSY.
>> > >> > > > > >  Additionally, the controller could immediately refuse new
>> > >> events
>> > >> > if
>> > >> > > > the
>> > >> > > > > > queue had reached its maximum length, and simply return
>> BUSY for
>> > >> > that
>> > >> > > > case
>> > >> > > > > > as well.
>> > >> > > > > >
>> > >> > > > > > Basically, the way we handle RPC timeouts in the controller
>> > >> right
>> > >> > now
>> > >> > > > is
>> > >> > > > > > not very good.  As you know, we time out the RPC, so the
>> client
>> > >> > gets
>> > >> > > > > > TimeoutException, but then keep the event on the queue, so
>> that
>> > >> it
>> > >> > > > > > eventually gets executed!  There's no reason why we have
>> to do
>> > >> > that.
>> > >> > > > We
>> > >> > > > > > could take the event off the queue if there is a timeout.
>> This
>> > >> > would
>> > >> > > > > > reduce load and mostly avoid the paradoxical situations you
>> > >> > describe
>> > >> > > > > > (getting TopicExistsException for a CreateTopicsRequest
>> retry,
>> > >> > etc.)
>> > >> > > > > >
>> > >> > > > > > I say "mostly" because there are still cases where retries
>> > >> could go
>> > >> > > > astray
>> > >> > > > > > (for example if we execute the topic creation but a network
>> > >> problem
>> > >> > > > > > prevents the response from being sent to the client).  But
>> this
>> > >> > would
>> > >> > > > still
>> > >> > > > > > be a very big improvement over what we have now.
>> > >> > > > > >
>> > >> > > > > > Sorry for commenting so late on this but I got distracted
>> by
>> > >> some
>> > >> > other
>> > >> > > > > > things.  I hope we can figure this one out-- I feel like
>> there
>> > >> is a
>> > >> > > > chance
>> > >> > > > > > to significantly simplify this.
>> > >> > > > > >
>> > >> > > > > > best,
>> > >> > > > > > Colin
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > On Fri, May 29, 2020, at 07:57, David Jacot wrote:
>> > >> > > > > > > Hi folks,
>> > >> > > > > > >
>> > >> > > > > > > I'd like to start the vote for KIP-599 which proposes a
>> new
>> > >> > quota to
>> > >> > > > > > > throttle create topic, create partition, and delete
>> topics
>> > >> > > > operations to
>> > >> > > > > > > protect the Kafka controller:
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > >
>> > >> >
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
>> > >> > > > > > >
>> > >> > > > > > > Please, let me know what you think.
>> > >> > > > > > >
>> > >> > > > > > > Cheers,
>> > >> > > > > > > David
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

Reply via email to