Hi David,

Thanks for the updates, looks good. Just a couple of minor comments:
1) There is a typo in "*The channel will be mutated as well when
`throttle_time_ms > 0`." * Should be *muted*?
2) Since the three requests will need a new field for `
*retryQuotaViolatedException*`, we should perhaps add that change to the
KIP.

Regards,

Rajini

On Wed, Jun 3, 2020 at 1:02 PM David Jacot <dja...@confluent.io> wrote:

> Hi all,
>
> I have updated the KIP based on our recent discussions. I have mainly
> changed the
> following points:
> * I have renamed the quota as suggested by Jun.
> * I have changed the metrics to be "token bucket" agnostic. The idea is to
> report the
> burst and the rate per principal/clientid.
> * I have removed the `retry.quota.violation.exception` configuration and
> replaced it
> with options in the respective methods' options.
>
> Now, the public interfaces are not tight to the algorithm that we use
> internally to throttle
> the requests but keep the notion of burst. I hope that this will help to
> alleviate the
> tokens bucket vs rate discussions.
>
> Please, have a look and let me know your thoughts.
>
> Bests,
> David
>
>
> On Wed, Jun 3, 2020 at 10:17 AM David Jacot <dja...@confluent.io> wrote:
>
> > Hi Rajini,
> >
> > Thanks for your feedback. Please find my answers below:
> >
> > 1) Our main goal is to protect the controller from the extreme users
> > (DDoS). We want
> > to protect it from large requests or repetitive requests coming from a
> > single user.
> > That user could be used by multiple apps as you pointed out which makes
> it
> > even
> > worst. For instance, a buggy application could continuously create and
> > delete
> > topics in a tight loop.
> >
> > The idea of the burst is to still allow creating or deleting topics in
> > batch because
> > this is how applications tend to do it. However, we want to keep the
> batch
> > size
> > under control with the burst. The burst does not allow requests of any
> > size. Topics
> > are accepted until the burst is passed. All the others are rejected.
> > Ideally, regular
> > and well behaving applications should never or rarely be throttled.
> >
> > 2) No, there is no explicit bound on the maximum throttle time. Having a
> > maximum
> > is straightforward here because the throttling time depends on the actual
> > size of
> > the request.
> >
> > 3) That's a very good question that I haven't thought of. I was thinking
> > about doing
> > it for the new quota only. I think that your suggestion of having a per
> > method argument
> > makes sense. I will update the KIP.
> >
> > 4) Indeed, it is outdated text. Let me update that.
> >
> > Regards,
> > David
> >
> > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <rajinisiva...@gmail.com>
> > wrote:
> >
> >> Hi David,
> >>
> >> Thanks for the KIP. A few questions below:
> >>
> >> 1) The KIP says: *`Typically, applications tend to send one request to
> >> create all the topics that they need`*. What would the point of
> throttling
> >> be in this case? If there was a user quota for the principal used by
> that
> >> application, wouldn't we just allow the request due to the burst value?
> Is
> >> the KIP specifically for the case where multiple apps with the same user
> >> principal are started all at once?
> >>
> >> 2) Will there be a bound on the maximum throttle time?
> >>
> >> 3) Will *retry.quota.violation.exception* have any impact on request
> quota
> >> throttling or is it limited to the three requests where the new quota is
> >> applied? If it is only for the new quota, perhaps it would be better to
> >> add
> >> an option to the relevant requests rather than use an admin client
> config.
> >>
> >> 4) Is this outdated text, wasn't sure what this refers to : *While the
> >> configuration could be set by broker, we envision to have it set only
> has
> >> a
> >> cluster wide default for two reasons.*
> >>
> >> Regards,
> >>
> >> Rajini
> >>
> >>
> >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot <dja...@confluent.io> wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > Thanks for your reply.
> >> >
> >> > 10. I think that both options are likely equivalent from an accuracy
> >> point
> >> > of
> >> > view. If we put the implementation aside, conceptually, I am not
> >> convinced
> >> > by the used based approach because resources don't have a clear owner
> >> > in AK at the moment. A topic can be created by (Principal A, no client
> >> id),
> >> > then can be extended by (no principal, Client B), and finally deleted
> by
> >> > (Principal C, Client C). This does not sound right to me and I fear
> >> that it
> >> > is not going to be easy to grasp for our users.
> >> >
> >> > Regarding the naming, I do agree that we can make it more future
> proof.
> >> > I propose `controller_mutations_rate`. I think that differentiating
> the
> >> > mutations
> >> > from the reads is still a good thing for the future.
> >> >
> >> > 11. I am not convinced by your proposal for the following reasons:
> >> >
> >> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> >> could
> >> > effectively allocate 5 * 100 requests to the previous windows assuming
> >> that
> >> > they are empty. What shall we do with the remaining 60 requests? Shall
> >> we
> >> > allocate them to the current window or shall we divide it among all
> the
> >> > windows?
> >> >
> >> > Second, I don't think that we can safely change the behavior of all
> the
> >> > existing
> >> > rates used because it actually changes the computation of the rate as
> >> > values
> >> > allocated to past windows would expire before they would today.
> >> >
> >> > Overall, while trying to fit in the current rate, we are going to
> build
> >> a
> >> > slightly
> >> > different version of the rate which will be even more confusing for
> >> users.
> >> >
> >> > Instead, I think that we should embrace the notion of burst as it
> could
> >> > also
> >> > be applied to other quotas in the future. Users don't have to know
> that
> >> we
> >> > use the Token Bucket or a special rate inside at the end of the day.
> It
> >> is
> >> > an
> >> > implementation detail.
> >> >
> >> > Users would be able to define:
> >> > - a rate R; and
> >> > - a maximum burst B.
> >> >
> >> > If we change the metrics to be as follow:
> >> > - the actual rate;
> >> > - the burst balance in %, 0 meaning that the user is throttled;
> >> > It remains disattach from the algorithm.
> >> >
> >> > I personally prefer this over having to define a rate and a number of
> >> > windows
> >> > while having to understand that the number of windows implicitly
> defines
> >> > the
> >> > allowed burst. I think that it is clearer and easier to grasp. Don't
> >> you?
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On Fri, May 29, 2020 at 6:38 PM Jun Rao <j...@confluent.io> wrote:
> >> >
> >> > > Hi, David, Anna,
> >> > >
> >> > > Thanks for the response. Sorry for the late reply.
> >> > >
> >> > > 10. Regarding exposing rate or usage as quota. Your argument is that
> >> > usage
> >> > > is not very accurate anyway and is harder to implement. So, let's
> just
> >> > be a
> >> > > bit loose and expose rate. I am sort of neutral on that. (1) It
> seems
> >> to
> >> > me
> >> > > that overall usage will be relatively more accurate than rate. All
> the
> >> > > issues that Anna brought up seem to exist for rate too. Rate has the
> >> > > additional problem that the cost of each request may not be uniform.
> >> (2)
> >> > In
> >> > > terms of implementation, a usage based approach requires tracking
> the
> >> > user
> >> > > info through the life cycle of an operation. However, as you
> >> mentioned,
> >> > > things like topic creation can generate additional
> >> > > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want
> to
> >> > > associate those cost to the user who initiated the operation. If we
> do
> >> > > that, we sort of need to track the user for the full life cycle of
> the
> >> > > processing of an operation anyway. (3) If you prefer rate strongly,
> I
> >> > don't
> >> > > have strong objections. However, I do feel that the new quota name
> >> should
> >> > > be able to cover all controller related cost longer term. This KIP
> >> > > currently only covers topic creation/deletion. It would not be ideal
> >> if
> >> > in
> >> > > the future, we have to add yet another type of quota for some other
> >> > > controller related operations. The quota name in the KIP has
> partition
> >> > > mutation. In the future, if we allow things like topic renaming, it
> >> may
> >> > not
> >> > > be related to partition mutation directly and it would be trickier
> to
> >> fit
> >> > > those operations in the current quota. So, maybe sth more general
> like
> >> > > controller_operations_quota will be more future proof.
> >> > >
> >> > > 11. Regarding the difference between the token bucket algorithm and
> >> our
> >> > > current quota mechanism. That's a good observation. It seems that we
> >> can
> >> > > make a slight change to our current quota mechanism to achieve a
> >> similar
> >> > > thing. As you said, the issue was that we allocate all 7 * 80
> >> requests in
> >> > > the last 1 sec window in our current mechanism. This is a bit
> >> unintuitive
> >> > > since each sec only has a quota capacity of 5. An alternative way is
> >> to
> >> > > allocate the 7 * 80 requests to all previous windows, each up to the
> >> > > remaining quota capacity left. So, you will fill the current 1 sec
> >> window
> >> > > and all previous ones, each with 5. Then, it seems this will give
> the
> >> > same
> >> > > behavior as token bucket? The reason that I keep asking this is that
> >> from
> >> > > an operational perspective, it's simpler if all types of quotas work
> >> in
> >> > the
> >> > > same way.
> >> > >
> >> > > Jun
> >> > >
> >> > > On Tue, May 26, 2020 at 9:52 AM David Jacot <dja...@confluent.io>
> >> wrote:
> >> > >
> >> > > > Hi folks,
> >> > > >
> >> > > > I have updated the KIP. As mentioned by Jun, I have made the
> >> > > > quota per principal/clientid similarly to the other quotas. I have
> >> > > > also explained how this will work in conjunction with the auto
> >> > > > topics creation.
> >> > > >
> >> > > > Please, take a look at it. I plan to call a vote for it in the
> next
> >> few
> >> > > > days if there are no comments in this thread.
> >> > > >
> >> > > > Best,
> >> > > > David
> >> > > >
> >> > > > On Wed, May 13, 2020 at 10:57 AM Tom Bentley <tbent...@redhat.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > Hi David,
> >> > > > >
> >> > > > > Thanks for the explanation and confirmation that evolving the
> >> APIs is
> >> > > not
> >> > > > > off the table in the longer term.
> >> > > > >
> >> > > > > Kind regards,
> >> > > > >
> >> > > > > Tom
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to