Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

David Jacot Wed, 03 Jun 2020 01:18:46 -0700

Hi Rajini,

Thanks for your feedback. Please find my answers below:


1) Our main goal is to protect the controller from the extreme users
(DDoS). We want
to protect it from large requests or repetitive requests coming from a
single user.
That user could be used by multiple apps as you pointed out which makes it
even
worst. For instance, a buggy application could continuously create and
delete
topics in a tight loop.

The idea of the burst is to still allow creating or deleting topics in
batch because
this is how applications tend to do it. However, we want to keep the batch
size
under control with the burst. The burst does not allow requests of any
size. Topics
are accepted until the burst is passed. All the others are rejected.
Ideally, regular
and well behaving applications should never or rarely be throttled.

2) No, there is no explicit bound on the maximum throttle time. Having a
maximum
is straightforward here because the throttling time depends on the actual
size of
the request.

3) That's a very good question that I haven't thought of. I was thinking
about doing
it for the new quota only. I think that your suggestion of having a per
method argument
makes sense. I will update the KIP.

4) Indeed, it is outdated text. Let me update that.

Regards,
David

On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <rajinisiva...@gmail.com>
wrote:

> Hi David,
>
> Thanks for the KIP. A few questions below:
>
> 1) The KIP says: *`Typically, applications tend to send one request to
> create all the topics that they need`*. What would the point of throttling
> be in this case? If there was a user quota for the principal used by that
> application, wouldn't we just allow the request due to the burst value? Is
> the KIP specifically for the case where multiple apps with the same user
> principal are started all at once?
>
> 2) Will there be a bound on the maximum throttle time?
>
> 3) Will *retry.quota.violation.exception* have any impact on request quota
> throttling or is it limited to the three requests where the new quota is
> applied? If it is only for the new quota, perhaps it would be better to add
> an option to the relevant requests rather than use an admin client config.
>
> 4) Is this outdated text, wasn't sure what this refers to : *While the
> configuration could be set by broker, we envision to have it set only has a
> cluster wide default for two reasons.*
>
> Regards,
>
> Rajini
>
>
> On Tue, Jun 2, 2020 at 2:55 PM David Jacot <dja...@confluent.io> wrote:
>
> > Hi Jun,
> >
> > Thanks for your reply.
> >
> > 10. I think that both options are likely equivalent from an accuracy
> point
> > of
> > view. If we put the implementation aside, conceptually, I am not
> convinced
> > by the used based approach because resources don't have a clear owner
> > in AK at the moment. A topic can be created by (Principal A, no client
> id),
> > then can be extended by (no principal, Client B), and finally deleted by
> > (Principal C, Client C). This does not sound right to me and I fear that
> it
> > is not going to be easy to grasp for our users.
> >
> > Regarding the naming, I do agree that we can make it more future proof.
> > I propose `controller_mutations_rate`. I think that differentiating the
> > mutations
> > from the reads is still a good thing for the future.
> >
> > 11. I am not convinced by your proposal for the following reasons:
> >
> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> could
> > effectively allocate 5 * 100 requests to the previous windows assuming
> that
> > they are empty. What shall we do with the remaining 60 requests? Shall we
> > allocate them to the current window or shall we divide it among all the
> > windows?
> >
> > Second, I don't think that we can safely change the behavior of all the
> > existing
> > rates used because it actually changes the computation of the rate as
> > values
> > allocated to past windows would expire before they would today.
> >
> > Overall, while trying to fit in the current rate, we are going to build a
> > slightly
> > different version of the rate which will be even more confusing for
> users.
> >
> > Instead, I think that we should embrace the notion of burst as it could
> > also
> > be applied to other quotas in the future. Users don't have to know that
> we
> > use the Token Bucket or a special rate inside at the end of the day. It
> is
> > an
> > implementation detail.
> >
> > Users would be able to define:
> > - a rate R; and
> > - a maximum burst B.
> >
> > If we change the metrics to be as follow:
> > - the actual rate;
> > - the burst balance in %, 0 meaning that the user is throttled;
> > It remains disattach from the algorithm.
> >
> > I personally prefer this over having to define a rate and a number of
> > windows
> > while having to understand that the number of windows implicitly defines
> > the
> > allowed burst. I think that it is clearer and easier to grasp. Don't you?
> >
> > Best,
> > David
> >
> > On Fri, May 29, 2020 at 6:38 PM Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, David, Anna,
> > >
> > > Thanks for the response. Sorry for the late reply.
> > >
> > > 10. Regarding exposing rate or usage as quota. Your argument is that
> > usage
> > > is not very accurate anyway and is harder to implement. So, let's just
> > be a
> > > bit loose and expose rate. I am sort of neutral on that. (1) It seems
> to
> > me
> > > that overall usage will be relatively more accurate than rate. All the
> > > issues that Anna brought up seem to exist for rate too. Rate has the
> > > additional problem that the cost of each request may not be uniform.
> (2)
> > In
> > > terms of implementation, a usage based approach requires tracking the
> > user
> > > info through the life cycle of an operation. However, as you mentioned,
> > > things like topic creation can generate additional
> > > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to
> > > associate those cost to the user who initiated the operation. If we do
> > > that, we sort of need to track the user for the full life cycle of the
> > > processing of an operation anyway. (3) If you prefer rate strongly, I
> > don't
> > > have strong objections. However, I do feel that the new quota name
> should
> > > be able to cover all controller related cost longer term. This KIP
> > > currently only covers topic creation/deletion. It would not be ideal if
> > in
> > > the future, we have to add yet another type of quota for some other
> > > controller related operations. The quota name in the KIP has partition
> > > mutation. In the future, if we allow things like topic renaming, it may
> > not
> > > be related to partition mutation directly and it would be trickier to
> fit
> > > those operations in the current quota. So, maybe sth more general like
> > > controller_operations_quota will be more future proof.
> > >
> > > 11. Regarding the difference between the token bucket algorithm and our
> > > current quota mechanism. That's a good observation. It seems that we
> can
> > > make a slight change to our current quota mechanism to achieve a
> similar
> > > thing. As you said, the issue was that we allocate all 7 * 80 requests
> in
> > > the last 1 sec window in our current mechanism. This is a bit
> unintuitive
> > > since each sec only has a quota capacity of 5. An alternative way is to
> > > allocate the 7 * 80 requests to all previous windows, each up to the
> > > remaining quota capacity left. So, you will fill the current 1 sec
> window
> > > and all previous ones, each with 5. Then, it seems this will give the
> > same
> > > behavior as token bucket? The reason that I keep asking this is that
> from
> > > an operational perspective, it's simpler if all types of quotas work in
> > the
> > > same way.
> > >
> > > Jun
> > >
> > > On Tue, May 26, 2020 at 9:52 AM David Jacot <dja...@confluent.io>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I have updated the KIP. As mentioned by Jun, I have made the
> > > > quota per principal/clientid similarly to the other quotas. I have
> > > > also explained how this will work in conjunction with the auto
> > > > topics creation.
> > > >
> > > > Please, take a look at it. I plan to call a vote for it in the next
> few
> > > > days if there are no comments in this thread.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Wed, May 13, 2020 at 10:57 AM Tom Bentley <tbent...@redhat.com>
> > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Thanks for the explanation and confirmation that evolving the APIs
> is
> > > not
> > > > > off the table in the longer term.
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Tom
> > > > >
> > > >
> > >
> >
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

Reply via email to