Hi Anna and Jun,

Anna, thanks for your thoughtful feedback. Overall, I agree with what you
said. If I summarize, you said that using time on server threads is not
easier
to tune than a rate based approach and it does not really capture all the
load neither as the control requests are not taken into account.

I have been thinking about using an approach similar to the request quota
and
here are my thoughts:

1. With the current architecture of the controller, the ZK writes resulting
from a
topic being created, expanded or deleted are spread among the api handler
threads and the controller thread. This is problematic to measure the real
thread
usage. I suppose that this will change with KIP-500 though.

2. I do agree with Anna that we don't only care about requests tying up the
controller
thread but also care about the control requests. A rate based approach
would allow
us to define a value which copes with both dimensions.

3. Doing the accounting in the controller requires to attach a principal
and a client id
to each topic as the quotas are expressed per principal and/or client id. I
find this a
little odd. These information would have to be updated whenever a topic is
expanded
or deleted and any subsequent operations for that topic in the controller
would be
accounted for the last principal and client id. As an example, imagine a
client that
deletes topics which have partitions on a dead broker. In this case, the
deletion would
be retried until it succeeds. If that client has a low quota, that may
prevent it from
doing any new operations until the delete succeeds. This is a strange
behavior.

4. One important aspect of the proposal is that we want to be able to
reject requests
when the quota is exhausted. With a usage based approach, it is difficult
to compute
the time to wait before being able to create the topics. The only that we
could do is
to ask the client to wait until the measure time drops to the quota to
ensure that a new
operations can be accepted. With a rate based approach, we can
precisely compute
the time to wait.

5. I find the experience for users slightly better with a rate based
approach. When I say
users here, I mean the users of the admin client or the cli. When
throttled, they would
get an error saying: "can't create because you have reached X partition
mutations/sec".
With the other approach, we could only say: "quota exceeded".

6. I do agree that the rate based approach is less generic in the long
term, especially if
other resource types are added in the controller.

Altogether, I am not convinced by a usage based approach and I would rather
lean
towards keeping the current proposal.

Best,
David

On Thu, May 7, 2020 at 2:11 AM Anna Povzner <a...@confluent.io> wrote:

> Hi David and Jun,
>
> I wanted to add to the discussion about using requests/sec vs. time on
> server threads (similar to request quota) for expressing quota for topic
> ops.
>
> I think request quota does not protect the brokers from overload by itself
> -- it still requires tuning and sometimes re-tuning, because it depends on
> the workload behavior of all users (like a relative share of requests
> exempt from throttling). This makes it not that easy to set. Let me give
> you more details:
>
>    1.
>
>    The amount of work that the user can get from the request quota depends
>    on the load from other users. We measure and enforce user's clock time
> on
>    threads---the time between 2 timestamps, one when the operation starts
> and
>    one when the operation ends. If the user is the only load on the
> broker, it
>    is less likely that their operation will be interrupted by the kernel to
>    switch to another thread, and time away from the thread still counts.
>    1.
>
>       Pros: this makes it more work-conserving, the user is less limited
>       when there are more resources available.
>       2.
>
>       Cons: Harder to capacity plan for the user, and could be confusing
>       when the broker will suddenly stop supporting the load which it was
>       supporting before.
>       2.
>
>    For the above reason, it makes most sense to maximize the user's quota
>    and set it as a percent of the maximum thread capacity (1100 with
> default
>    broker config).
>    3.
>
>    However, the actual maximum threads capacity is not really 1100:
>    1.
>
>       Some of it will be taken by requests exempt from throttling, and the
>       amount depends on the workload. We have seen cases (somewhat rare)
> where
>       requests exempt from throttling take like ⅔ of the time on threads.
>       2.
>
>       We have also seen cases of an overloaded cluster (full queues,
>       timeouts, etc) due to high request rate while the time used on
> threads was
>       way below the max (1100), like 600 or 700 (total exempt + non-exempt
>       usage). Basically, when a broker is close to 100% CPU, it takes more
> and
>       more time for the "unaccounted" work like thread getting a chance to
> pick
>       up a request from the queue and get a timestamp.
>       4.
>
>    As a result, there will be some tuning to decide on a safe value for
>    total thread capacity, from where users can carve out their quotas. Some
>    changes in users' workloads may require re-tuning, if, for example, it
>    dramatically changes the relative share of non-exempt load.
>
>
> I think request quota works well for client request load in a sense that it
> ensures that different users get a fair/proportional share of resources
> during high broker load. If the user cannot get enough resources from their
> quota to support their request rate anymore, they can monitor their load
> and expand the cluster if needed (or rebalance).
>
> However, I think using time on threads for topic ops could be even more
> difficult than simple request rate (as proposed):
>
>    1.
>
>    I understand that we don't only care about topic requests tying up the
>    controller thread, but we also care that it does not create a large
> extra
>    load on the cluster due to LeaderAndIsr and other related requests
> (this is
>    more important for small clusters).
>    2.
>
>    For that reason, tuning quota in terms of time on threads can be harder,
>    because there is no easy way to say how this quota would translate to a
>    number of operations (because that would depend on other broker load).
>
>
> Since tuning would be required anyway, I see the following workflow if we
> express controller quota in terms of partition mutations per second:
>
>    1.
>
>    Run topic workload in isolation (the most expensive one, like create
>    topic vs. add partitions) and see how much load it adds based on
> incoming
>    rate. Choose quota depending on how much extra load your cluster can
>    sustain in addition to its normal load.
>    2.
>
>    Could be useful to publish some experimental results to give some
>    ballpark numbers to make this sizing easier.
>
>
> I am interested to see if you agree with the listed assumptions here. I may
> have missed something, especially if there is an easier workflow for
> setting quota based on time on threads.
>
> Thanks,
>
> Anna
>
>
> On Thu, Apr 30, 2020 at 8:13 AM Tom Bentley <tbent...@redhat.com> wrote:
>
> > Hi David,
> >
> > Thanks for the KIP.
> >
> > If I understand the proposed throttling algorithm, an initial request
> would
> > be allowed (possibly making K negative) and only subsequent requests
> > (before K became positive) would receive the QUOTA_VIOLATED. That would
> > mean it was still possible to block the controller from handling other
> > events – you just need to do so via making one big request.
> >
> > While the reasons for rejecting execution throttling make sense given the
> > RPCs we have today that seems to be at the cost of still allowing harm to
> > the cluster, or did I misunderstand?
> >
> > Kind regards,
> >
> > Tom
> >
> >
> >
> > On Tue, Apr 28, 2020 at 1:49 AM Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply. A few more comments.
> > >
> > > 1. I am actually not sure if a quota based on request rate is easier
> for
> > > the users. For context, in KIP-124, we started with a request rate
> quota,
> > > but ended up not choosing it. The main issues are (a) requests are not
> > > equal; some are more expensive than others; (b) the users typically
> don't
> > > know how expensive each type of request is. For example, a big part of
> > > the controller cost is ZK write. To create a new topic with 1
> partition,
> > > the number of ZK writes is 4 (1 for each segment
> > > in /brokers/topics/[topic]/partitions/[partitionId]/state). The cost of
> > > adding one partition to an existing topic requires 2 ZK writes. The
> cost
> > of
> > > deleting a topic with 1 partition requires 6 to 7 ZK writes. It's
> > unlikely
> > > for a user to know the exact cost associated with those
> > > implementation details. If users don't know the cost, it's not clear if
> > > they can set the rate properly.
> > >
> > > 2. I think that depends on the goal. To me, the common problem is that
> > you
> > > have many applications running on a shared Kafka cluster and one of the
> > > applications abuses the broker by issuing too many requests. In this
> > case,
> > > a global quota will end up throttling every application. However, what
> we
> > > really want in this case is to only throttle the application that
> causes
> > > the problem. A user level quota solves this problem more effectively.
> We
> > > may still need some sort of global quota when the total usage from all
> > > applications exceeds the broker resource. But that seems to be
> secondary
> > > since it's uncommon for all applications' usage to go up at the same
> > time.
> > > We can also solve this problem by reducing the per user quota for every
> > > application if there is a user level quota.
> > >
> > > 3. Not sure that I fully understand the difference in burst balance.
> The
> > > current throttling logic works as follows. Each quota is measured over
> a
> > > number of time windows. Suppose the Quota is to X/sec. If time passes
> and
> > > the quota is not being used, we are accumulating credit at the rate of
> > > X/sec. If a quota is being used, we are reducing that credit based on
> the
> > > usage. The credit expires when the corresponding window rolls out. The
> > max
> > > credit that can be accumulated is X * number of windows * window size.
> > So,
> > > in some sense, the current logic also supports burst and a way to cap
> the
> > > burst. Could you explain the difference with Token Bucket a bit more?
> > Also,
> > > the current quota system always executes the current request even if
> it's
> > > being throttled. It just informs the client to back off a throttled
> > amount
> > > of time before sending another request.
> > >
> > > Jun
> > >
> > >
> > >
> > > On Mon, Apr 27, 2020 at 5:15 AM David Jacot <dja...@confluent.io>
> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thank you for the feedback.
> > > >
> > > > 1. You are right. At the end, we do care about the percentage of time
> > > that
> > > > an operation ties up the controller thread. I thought about this but
> I
> > > was
> > > > not entirely convinced by it for following reasons:
> > > >
> > > > 1.1. While I do agree that setting up a rate and a burst is a bit
> > harder
> > > > than
> > > > allocating a percentage for the administrator of the cluster, I
> believe
> > > > that a
> > > > rate and a burst are way easier to understand for the users of the
> > > cluster.
> > > >
> > > > 1.2. Measuring the time that a request ties up the controller thread
> is
> > > not
> > > > as straightforward as it sounds because the controller reacts to ZK
> > > > TopicChange and TopicDeletion events in lieu of handling requests
> > > directly.
> > > > These events do not carry on the client id nor the user information
> so
> > > the
> > > > best would be to refactor the controller to accept requests instead
> of
> > > > reacting
> > > > to the events. This will be possible with KIP-590. It has obviously
> > other
> > > > side effects in the controller (e.g. batching).
> > > >
> > > > I leaned towards the current proposal mainly due to 1.1. as 1.2. can
> be
> > > (or
> > > > will be) fixed. Does 1.1. sound like a reasonable trade off to you?
> > > >
> > > > 2. It is not in the current proposal. I thought that a global quota
> > would
> > > > be
> > > > enough to start with. We can definitely make it work like the other
> > > quotas.
> > > >
> > > > 3. The main difference is that the Token Bucket algorithm defines an
> > > > explicit
> > > > burst B while guaranteeing an average rate R whereas our existing
> quota
> > > > guarantees an average rate R as well but starts to throttle as soon
> as
> > > the
> > > > rate goes above the defined quota.
> > > >
> > > > Creating and deleting topics is bursty by nature. Applications create
> > or
> > > > delete
> > > > topics occasionally by usually sending one request with multiple
> > topics.
> > > > The
> > > > reasoning behind allowing a burst is to allow such requests with a
> > > > reasonable
> > > > size to pass without being throttled whereas our current quota
> > mechanism
> > > > would reject any topics as soon as the rate is above the quota
> > requiring
> > > > the
> > > > applications to send subsequent requests to create or to delete all
> the
> > > > topics.
> > > >
> > > > Best,
> > > > David
> > > >
> > > >
> > > > On Fri, Apr 24, 2020 at 9:03 PM Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Hi, David,
> > > > >
> > > > > Thanks for the KIP. A few quick comments.
> > > > >
> > > > > 1. About quota.partition.mutations.rate. I am not sure if it's very
> > > easy
> > > > > for the user to set the quota as a rate. For example, each
> partition
> > > > > mutation could take a different number of ZK operations (depending
> on
> > > > > things like retry). The time to process each ZK operation may also
> > vary
> > > > > from cluster to cluster. An alternative way to model this is to do
> > sth
> > > > > similar to the request (CPU) quota, which exposes quota as a
> > percentage
> > > > of
> > > > > the server threads that can be used. The current request quota
> > doesn't
> > > > > include the controller thread. We could add something that
> > > > measures/exposes
> > > > > the percentage of time that a request ties up the controller
> thread,
> > > > which
> > > > > seems to be what we really care about.
> > > > >
> > > > > 2. Is the new quota per user? Intuitively, we want to only penalize
> > > > > applications that overuse the broker resources, but not others.
> Also,
> > > in
> > > > > existing types of quotas (request, bandwidth), there is a hierarchy
> > > among
> > > > > clientId vs user and default vs customized (see
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
> > > > > ). Does the new quota fit into the existing hierarchy?
> > > > >
> > > > > 3. It seems that you are proposing a new quota mechanism based on
> > Token
> > > > > Bucket algorithm. Could you describe its tradeoff with the existing
> > > quota
> > > > > mechanism? Ideally, it would be better if we have a single quota
> > > > mechanism
> > > > > within Kafka.
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Apr 24, 2020 at 9:52 AM David Jacot <dja...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > I'd like to start the discussion for KIP-599:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> > > > > >
> > > > > > It proposes to introduce quotas for the create topics, create
> > > > partitions
> > > > > > and delete topics operations. Let me know what you think, thanks.
> > > > > >
> > > > > > Best,
> > > > > > David
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to