Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

Tom Bentley Thu, 04 Jun 2020 07:58:50 -0700

Hi David,

As a user I might expect the validateOnly option to do everything except
actually make the changes. That interpretation would imply the quota should
be checked, but the check should obviously be side-effect free. I think
this interpretation could be useful because it gives the caller either some
confidence that they're not going to hit the quota, or tell them, via the
exception, when they can expect the call to work. But for this to be useful
it would require the retry logic to not retry the request when validateOnly
was set.


On the other hand, if validateOnly is really about validating only some
aspects of the request (which maybe is what the name implies), then we
should clarify in the Javadoc that the quota is not included in the
validation.

On balance, I agree with what you're proposing, since the extra utility of
including the quota in the validation seems to be not worth the extra
complication for the retry.

Thanks,

Tom



On Thu, Jun 4, 2020 at 3:32 PM David Jacot <[email protected]> wrote:

> Hi Tom,
>
> That's a good question. As the validation does not create any load on the
> controller, I was planning to do it without checking the quota at all. Does
> that
> sound reasonable?
>
> Best,
> David
>
> On Thu, Jun 4, 2020 at 4:23 PM David Jacot <[email protected]> wrote:
>
> > Hi Jun and Anna,
> >
> > Thank you both for your replies.
> >
> > Based on our recent discussion, I agree that using a rate is better to
> > remain
> > consistent with other quotas. As you both suggested, it seems that
> changing
> > the way we compute the rate to better handle spiky workloads and behave a
> > bit more similarly to the token bucket algorithm makes sense for all
> > quotas as
> > well.
> >
> > I will update the KIP to reflect this.
> >
> > Anna, I think that we can explain this in this KIP. We can't just say
> that
> > the Rate
> > will be updated in this KIP. I think that we need to give a bit more
> info.
> >
> > Best,
> > David
> >
> > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner <[email protected]> wrote:
> >
> >> Hi Jun and David,
> >>
> >> Regarding token bucket vs, Rate behavior. We recently observed a couple
> of
> >> cases where a bursty workload behavior would result in long-ish pauses
> in
> >> between, resulting in lower overall bandwidth than the quota. I will
> need
> >> to debug this a bit more to be 100% sure, but it does look like the case
> >> described by David earlier in this thread. So, I agree with Jun -- I
> think
> >> we should make all quota rate behavior consistent, and probably similar
> to
> >> the token bucket one.
> >>
> >> Looking at KIP-13, it doesn't describe rate calculation in enough
> detail,
> >> but does mention window size. So, we could keep "window size" and
> "number
> >> of samples" configs and change Rate implementation to be more similar to
> >> token bucket:
> >> * number of samples define our burst size
> >> * Change the behavior, which could be described as: If a burst happens
> >> after an idle period, the burst would effectively spread evenly over the
> >> (now - window) time period, where window is (<number of samples> - 1)*
> >> <window size>. Which basically describes a token bucket, while keeping
> the
> >> current quota configs. I think we can even implement this by changing
> the
> >> way we record the last sample or lastWindowMs.
> >>
> >> Jun, if we would be changing Rate calculation behavior in bandwidth and
> >> request quotas, would we need a separate KIP? Shouldn't need to if we
> >> keep window size and number of samples configs, right?
> >>
> >> Thanks,
> >> Anna
> >>
> >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao <[email protected]> wrote:
> >>
> >> > Hi, David,
> >> >
> >> > Thanks for the reply.
> >> >
> >> > 11. To match the behavior in the Token bucket approach, I was thinking
> >> that
> >> > requests that don't fit in the previous time windows will be
> >> accumulated in
> >> > the current time window. So, the 60 extra requests will be accumulated
> >> in
> >> > the latest window. Then, the client also has to wait for 12 more secs
> >> > before throttling is removed. I agree that this is probably a better
> >> > behavior and it's reasonable to change the existing behavior to this
> >> one.
> >> >
> >> > To me, it seems that sample_size * num_windows is the same as max
> burst
> >> > balance. The latter seems a bit better to configure. The thing is that
> >> the
> >> > existing quota system has already been used in quite a few places and
> >> if we
> >> > want to change the configuration in the future, there is the migration
> >> > cost. Given that, do you feel it's better to adopt the  new token
> bucket
> >> > terminology or just adopt the behavior somehow into our existing
> >> system? If
> >> > it's the former, it would be useful to document this in the rejected
> >> > section and add a future plan on migrating existing quota
> >> configurations.
> >> >
> >> > Jun
> >> >
> >> >
> >> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot <[email protected]>
> wrote:
> >> >
> >> > > Hi Jun,
> >> > >
> >> > > Thanks for your reply.
> >> > >
> >> > > 10. I think that both options are likely equivalent from an accuracy
> >> > point
> >> > > of
> >> > > view. If we put the implementation aside, conceptually, I am not
> >> > convinced
> >> > > by the used based approach because resources don't have a clear
> owner
> >> > > in AK at the moment. A topic can be created by (Principal A, no
> client
> >> > id),
> >> > > then can be extended by (no principal, Client B), and finally
> deleted
> >> by
> >> > > (Principal C, Client C). This does not sound right to me and I fear
> >> that
> >> > it
> >> > > is not going to be easy to grasp for our users.
> >> > >
> >> > > Regarding the naming, I do agree that we can make it more future
> >> proof.
> >> > > I propose `controller_mutations_rate`. I think that differentiating
> >> the
> >> > > mutations
> >> > > from the reads is still a good thing for the future.
> >> > >
> >> > > 11. I am not convinced by your proposal for the following reasons:
> >> > >
> >> > > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> >> > could
> >> > > effectively allocate 5 * 100 requests to the previous windows
> assuming
> >> > that
> >> > > they are empty. What shall we do with the remaining 60 requests?
> >> Shall we
> >> > > allocate them to the current window or shall we divide it among all
> >> the
> >> > > windows?
> >> > >
> >> > > Second, I don't think that we can safely change the behavior of all
> >> the
> >> > > existing
> >> > > rates used because it actually changes the computation of the rate
> as
> >> > > values
> >> > > allocated to past windows would expire before they would today.
> >> > >
> >> > > Overall, while trying to fit in the current rate, we are going to
> >> build a
> >> > > slightly
> >> > > different version of the rate which will be even more confusing for
> >> > users.
> >> > >
> >> > > Instead, I think that we should embrace the notion of burst as it
> >> could
> >> > > also
> >> > > be applied to other quotas in the future. Users don't have to know
> >> that
> >> > we
> >> > > use the Token Bucket or a special rate inside at the end of the day.
> >> It
> >> > is
> >> > > an
> >> > > implementation detail.
> >> > >
> >> > > Users would be able to define:
> >> > > - a rate R; and
> >> > > - a maximum burst B.
> >> > >
> >> > > If we change the metrics to be as follow:
> >> > > - the actual rate;
> >> > > - the burst balance in %, 0 meaning that the user is throttled;
> >> > > It remains disattach from the algorithm.
> >> > >
> >> > > I personally prefer this over having to define a rate and a number
> of
> >> > > windows
> >> > > while having to understand that the number of windows implicitly
> >> defines
> >> > > the
> >> > > allowed burst. I think that it is clearer and easier to grasp. Don't
> >> you?
> >> > >
> >> > > Best,
> >> > > David
> >> > >
> >> > > On Fri, May 29, 2020 at 6:38 PM Jun Rao <[email protected]> wrote:
> >> > >
> >> > > > Hi, David, Anna,
> >> > > >
> >> > > > Thanks for the response. Sorry for the late reply.
> >> > > >
> >> > > > 10. Regarding exposing rate or usage as quota. Your argument is
> that
> >> > > usage
> >> > > > is not very accurate anyway and is harder to implement. So, let's
> >> just
> >> > > be a
> >> > > > bit loose and expose rate. I am sort of neutral on that. (1) It
> >> seems
> >> > to
> >> > > me
> >> > > > that overall usage will be relatively more accurate than rate. All
> >> the
> >> > > > issues that Anna brought up seem to exist for rate too. Rate has
> the
> >> > > > additional problem that the cost of each request may not be
> uniform.
> >> > (2)
> >> > > In
> >> > > > terms of implementation, a usage based approach requires tracking
> >> the
> >> > > user
> >> > > > info through the life cycle of an operation. However, as you
> >> mentioned,
> >> > > > things like topic creation can generate additional
> >> > > > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably
> want
> >> to
> >> > > > associate those cost to the user who initiated the operation. If
> we
> >> do
> >> > > > that, we sort of need to track the user for the full life cycle of
> >> the
> >> > > > processing of an operation anyway. (3) If you prefer rate
> strongly,
> >> I
> >> > > don't
> >> > > > have strong objections. However, I do feel that the new quota name
> >> > should
> >> > > > be able to cover all controller related cost longer term. This KIP
> >> > > > currently only covers topic creation/deletion. It would not be
> >> ideal if
> >> > > in
> >> > > > the future, we have to add yet another type of quota for some
> other
> >> > > > controller related operations. The quota name in the KIP has
> >> partition
> >> > > > mutation. In the future, if we allow things like topic renaming,
> it
> >> may
> >> > > not
> >> > > > be related to partition mutation directly and it would be trickier
> >> to
> >> > fit
> >> > > > those operations in the current quota. So, maybe sth more general
> >> like
> >> > > > controller_operations_quota will be more future proof.
> >> > > >
> >> > > > 11. Regarding the difference between the token bucket algorithm
> and
> >> our
> >> > > > current quota mechanism. That's a good observation. It seems that
> we
> >> > can
> >> > > > make a slight change to our current quota mechanism to achieve a
> >> > similar
> >> > > > thing. As you said, the issue was that we allocate all 7 * 80
> >> requests
> >> > in
> >> > > > the last 1 sec window in our current mechanism. This is a bit
> >> > unintuitive
> >> > > > since each sec only has a quota capacity of 5. An alternative way
> >> is to
> >> > > > allocate the 7 * 80 requests to all previous windows, each up to
> the
> >> > > > remaining quota capacity left. So, you will fill the current 1 sec
> >> > window
> >> > > > and all previous ones, each with 5. Then, it seems this will give
> >> the
> >> > > same
> >> > > > behavior as token bucket? The reason that I keep asking this is
> that
> >> > from
> >> > > > an operational perspective, it's simpler if all types of quotas
> >> work in
> >> > > the
> >> > > > same way.
> >> > > >
> >> > > > Jun
> >> > > >
> >> > > > On Tue, May 26, 2020 at 9:52 AM David Jacot <[email protected]>
> >> > wrote:
> >> > > >
> >> > > > > Hi folks,
> >> > > > >
> >> > > > > I have updated the KIP. As mentioned by Jun, I have made the
> >> > > > > quota per principal/clientid similarly to the other quotas. I
> have
> >> > > > > also explained how this will work in conjunction with the auto
> >> > > > > topics creation.
> >> > > > >
> >> > > > > Please, take a look at it. I plan to call a vote for it in the
> >> next
> >> > few
> >> > > > > days if there are no comments in this thread.
> >> > > > >
> >> > > > > Best,
> >> > > > > David
> >> > > > >
> >> > > > > On Wed, May 13, 2020 at 10:57 AM Tom Bentley <
> [email protected]
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > Hi David,
> >> > > > > >
> >> > > > > > Thanks for the explanation and confirmation that evolving the
> >> APIs
> >> > is
> >> > > > not
> >> > > > > > off the table in the longer term.
> >> > > > > >
> >> > > > > > Kind regards,
> >> > > > > >
> >> > > > > > Tom
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

Reply via email to