RE: [KIP-DISCUSSION] KIP-13 Quotas

Aditya Auradkar Fri, 03 Apr 2015 10:13:55 -0700

Thanks Jun.

Some thoughts:


10) I think it is better we throttle regardless of the produce/fetch version. 
This is a nice feature where clients can tell if they are being throttled or 
not. If we only throttle newer clients, then we have inconsistent behavior 
across clients in a multi-tenant cluster. Having quota metrics on the client 
side is also a nice incentive to upgrade client versions.

11) I think we can call metric.record(fetchSize) before adding the delayedFetch 
request into the purgatory. This will give us the estimated delay of the 
request up-front. The timeout on the DelayedFetch is the Max(maxWait, 
quotaDelay). The DelayedFetch completion criteria can change a little to 
accomodate quotas.

- I agree the quota code should return the estimated delay time in 
QuotaViolationException.

Thanks,
Aditya

________________________________________
From: Jun Rao [[email protected]]
Sent: Friday, April 03, 2015 9:16 AM
To: [email protected]
Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas

Thanks for the update.

10. About whether to return a new field in the response to indicate
throttling. Earlier, the plan was to not change the response format and
just have a metric on the broker to indicate whether a clientId is
throttled or not. The issue is that we don't know whether a particular
clientId instance is throttled or not (since there could be multiple
clients with the same clientId). Your proposal of adding an isThrottled
field in the response addresses and seems better. Then, do we just throttle
the new version of produce/fetch request or both the old and the new
versions? Also, we probably still need a separate metric on the broker side
to indicate whether a clientId is throttled or not.

11. Just to clarify. For fetch requests, when will metric.record(fetchSize)
be called? Is it when we are ready to send the fetch response (after
minBytes and maxWait are satisfied)?

As an implementation detail, it may be useful for the quota code to return
an estimated delay time (to bring the measurement within the limit) in
QuotaViolationException.

Thanks,

Jun

On Wed, Apr 1, 2015 at 3:27 PM, Aditya Auradkar <
[email protected]> wrote:

> Hey everyone,
>
> I've made changes to the KIP to capture our discussions over the last
> couple of weeks.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
>
> I'll start a voting thread after people have had a chance to read/comment.
>
> Thanks,
> Aditya
>
> ________________________________________
> From: Steven Wu [[email protected]]
> Sent: Friday, March 20, 2015 9:14 AM
> To: [email protected]
> Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
>
> +1 on Jun's suggestion of maintaining one set/style of metrics at broker.
> In Netflix, we have to convert the yammer metrics to servo metrics at
> broker. it will be painful to know some metrics are in a different style
> and get to be handled differently.
>
> On Fri, Mar 20, 2015 at 8:17 AM, Jun Rao <[email protected]> wrote:
>
> > Not so sure. People who use quota will definitely want to monitor the new
> > metrics at the client id level. Then they will need to deal with those
> > metrics differently from the rest of the metrics. It would be better if
> we
> > can hide this complexity from the users.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Mar 19, 2015 at 10:45 PM, Joel Koshy <[email protected]>
> wrote:
> >
> > > Actually thinking again - since these will be a few new metrics at the
> > > client id level (bytes in and bytes out to start with) maybe it is fine
> > to
> > > have the two type of metrics coexist and we can migrate the existing
> > > metrics in parallel.
> > >
> > > On Thursday, March 19, 2015, Joel Koshy <[email protected]> wrote:
> > >
> > > > That is a valid concern but in that case I think it would be better
> to
> > > > just migrate completely to the new metrics package first.
> > > >
> > > > On Thursday, March 19, 2015, Jun Rao <[email protected]
> > > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
> > > >
> > > >> Hmm, I was thinking a bit differently on the metrics stuff. I think
> it
> > > >> would be confusing to have some metrics defined in the new metrics
> > > package
> > > >> while some others defined in Coda Hale. Those metrics will look
> > > different
> > > >> (e.g., rates in Coda Hale will have special attributes such as
> > > >> 1-min-average). People may need different ways to export the metrics
> > to
> > > >> external systems such as Graphite. So, instead of using the new
> > metrics
> > > >> package on the broker, I was thinking that we can just implement a
> > > >> QuotaMetrics that wraps the Coda Hale metrics. The implementation
> can
> > be
> > > >> the same as what's in the new metrics package.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >> On Thu, Mar 19, 2015 at 8:09 PM, Jay Kreps <[email protected]>
> > wrote:
> > > >>
> > > >> > Yeah I was saying was that we are blocked on picking an approach
> for
> > > >> > metrics but not necessarily the full conversion. Clearly if we
> pick
> > > the
> > > >> new
> > > >> > metrics package we would need to implement the two metrics we want
> > to
> > > >> quota
> > > >> > on. But the conversion of the remaining metrics can be done
> > > >> asynchronously.
> > > >> >
> > > >> > -Jay
> > > >> >
> > > >> > On Thu, Mar 19, 2015 at 5:56 PM, Joel Koshy <[email protected]>
> > > >> wrote:
> > > >> >
> > > >> > > > in KAFKA-1930). I agree that this KIP doesn't need to block on
> > the
> > > >> > > > migration of the metrics package.
> > > >> > >
> > > >> > > Can you clarify the above? i.e., if we are going to quota on
> > > something
> > > >> > > then we would want to have migrated that metric over right? Or
> do
> > > you
> > > >> > > mean we don't need to complete the migration of all metrics to
> the
> > > >> > > metrics package right?
> > > >> > >
> > > >> > > I think most of us now feel that the delay + no error is a good
> > > >> > > approach, but it would be good to make sure everyone is on the
> > same
> > > >> > > page.
> > > >> > >
> > > >> > > As Aditya requested a couple of days ago I think we should go
> over
> > > >> > > this at the next KIP hangout.
> > > >> > >
> > > >> > > Joel
> > > >> > >
> > > >> > > On Thu, Mar 19, 2015 at 09:24:09AM -0700, Jun Rao wrote:
> > > >> > > > 1. Delay + no error seems reasonable to me. However, I do feel
> > > that
> > > >> we
> > > >> > > need
> > > >> > > > to give the client an indicator that it's being throttled,
> > instead
> > > >> of
> > > >> > > doing
> > > >> > > > this silently. For that, we probably need to evolve the
> > > >> produce/fetch
> > > >> > > > protocol to include an extra status field in the response. We
> > > >> probably
> > > >> > > need
> > > >> > > > to think more about whether we just want to return a simple
> > status
> > > >> code
> > > >> > > > (e.g., 1 = throttled) or a value that indicates how much is
> > being
> > > >> > > throttled.
> > > >> > > >
> > > >> > > > 2. We probably need to improve the histogram support in the
> new
> > > >> metrics
> > > >> > > > package before we can use it more widely on the server side
> > (left
> > > a
> > > >> > > comment
> > > >> > > > in KAFKA-1930). I agree that this KIP doesn't need to block on
> > the
> > > >> > > > migration of the metrics package.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > > > Jun
> > > >> > > >
> > > >> > > > On Wed, Mar 18, 2015 at 4:02 PM, Aditya Auradkar <
> > > >> > > > [email protected]> wrote:
> > > >> > > >
> > > >> > > > > Hey everyone,
> > > >> > > > >
> > > >> > > > > Thanks for the great discussion. There are currently a few
> > > points
> > > >> on
> > > >> > > this
> > > >> > > > > KIP that need addressing and I want to make sure we are on
> the
> > > >> same
> > > >> > > page
> > > >> > > > > about those.
> > > >> > > > >
> > > >> > > > > 1. Append and delay response vs delay and return error
> > > >> > > > > - I think we've discussed the pros and cons of each approach
> > but
> > > >> > > haven't
> > > >> > > > > chosen an approach yet. Where does everyone stand on this
> > issue?
> > > >> > > > >
> > > >> > > > > 2. Metrics Migration and usage in quotas
> > > >> > > > > - The metrics library in clients has a notion of quotas that
> > we
> > > >> > should
> > > >> > > > > reuse. For that to happen, we need to migrate the server to
> > the
> > > >> new
> > > >> > > metrics
> > > >> > > > > package.
> > > >> > > > > - Need more clarification on how to compute throttling time
> > and
> > > >> > > windowing
> > > >> > > > > for quotas.
> > > >> > > > >
> > > >> > > > > I'm going to start a new KIP to discuss metrics migration
> > > >> separately.
> > > >> > > That
> > > >> > > > > will also contain a section on quotas.
> > > >> > > > >
> > > >> > > > > 3. Dynamic Configuration management - Being discussed in
> > KIP-5.
> > > >> > > Basically
> > > >> > > > > we need something that will model default quotas and allow
> > > >> per-client
> > > >> > > > > overrides.
> > > >> > > > >
> > > >> > > > > Is there something else that I'm missing?
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Aditya
> > > >> > > > > ________________________________________
> > > >> > > > > From: Jay Kreps [[email protected]]
> > > >> > > > > Sent: Wednesday, March 18, 2015 2:10 PM
> > > >> > > > > To: [email protected]
> > > >> > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > >> > > > >
> > > >> > > > > Hey Steven,
> > > >> > > > >
> > > >> > > > > The current proposal is actually to enforce quotas at the
> > > >> > > > > client/application level, NOT the topic level. So if you
> have
> > a
> > > >> > service
> > > >> > > > > with a few dozen instances the quota is against all of those
> > > >> > instances
> > > >> > > > > added up across all their topics. So actually the effect
> would
> > > be
> > > >> the
> > > >> > > same
> > > >> > > > > either way but throttling gives the producer the choice of
> > > either
> > > >> > > blocking
> > > >> > > > > or dropping.
> > > >> > > > >
> > > >> > > > > -Jay
> > > >> > > > >
> > > >> > > > > On Tue, Mar 17, 2015 at 10:08 AM, Steven Wu <
> > > [email protected]
> > > >> >
> > > >> > > wrote:
> > > >> > > > >
> > > >> > > > > > Jay,
> > > >> > > > > >
> > > >> > > > > > let's say an app produces to 10 different topics. one of
> the
> > > >> topic
> > > >> > is
> > > >> > > > > sent
> > > >> > > > > > from a library. due to whatever condition/bug, this lib
> > starts
> > > >> to
> > > >> > > send
> > > >> > > > > > messages over the quota. if we go with the delayed
> response
> > > >> > > approach, it
> > > >> > > > > > will cause the whole shared RecordAccumulator buffer to be
> > > >> filled
> > > >> > up.
> > > >> > > > > that
> > > >> > > > > > will penalize other 9 topics who are within the quota.
> that
> > is
> > > >> the
> > > >> > > > > > unfairness point that Ewen and I were trying to make.
> > > >> > > > > >
> > > >> > > > > > if broker just drop the msg and return an error/status
> code
> > > >> > > indicates the
> > > >> > > > > > drop and why. then producer can just move on and accept
> the
> > > >> drop.
> > > >> > > shared
> > > >> > > > > > buffer won't be saturated and other 9 topics won't be
> > > penalized.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Steven
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Tue, Mar 17, 2015 at 9:44 AM, Jay Kreps <
> > > [email protected]
> > > >> >
> > > >> > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hey Steven,
> > > >> > > > > > >
> > > >> > > > > > > It is true that hitting the quota will cause
> back-pressure
> > > on
> > > >> the
> > > >> > > > > > producer.
> > > >> > > > > > > But the solution is simple, a producer that wants to
> avoid
> > > >> this
> > > >> > > should
> > > >> > > > > > stay
> > > >> > > > > > > under its quota. In other words this is a contract
> between
> > > the
> > > >> > > cluster
> > > >> > > > > > and
> > > >> > > > > > > the client, with each side having something to uphold.
> > Quite
> > > >> > > possibly
> > > >> > > > > the
> > > >> > > > > > > same thing will happen in the absence of a quota, a
> client
> > > >> that
> > > >> > > > > produces
> > > >> > > > > > an
> > > >> > > > > > > unexpected amount of load will hit the limits of the
> > server
> > > >> and
> > > >> > > > > > experience
> > > >> > > > > > > backpressure. Quotas just allow you to set that same
> limit
> > > at
> > > >> > > something
> > > >> > > > > > > lower than 100% of all resources on the server, which is
> > > >> useful
> > > >> > > for a
> > > >> > > > > > > shared cluster.
> > > >> > > > > > >
> > > >> > > > > > > -Jay
> > > >> > > > > > >
> > > >> > > > > > > On Mon, Mar 16, 2015 at 11:34 PM, Steven Wu <
> > > >> > [email protected]>
> > > >> > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > wait. we create one kafka producer for each cluster.
> > each
> > > >> > > cluster can
> > > >> > > > > > > have
> > > >> > > > > > > > many topics. if producer buffer got filled up due to
> > > delayed
> > > >> > > response
> > > >> > > > > > for
> > > >> > > > > > > > one throttled topic, won't that penalize other topics
> > > >> unfairly?
> > > >> > > it
> > > >> > > > > > seems
> > > >> > > > > > > to
> > > >> > > > > > > > me that broker should just return error without delay.
> > > >> > > > > > > >
> > > >> > > > > > > > sorry that I am chatting to myself :)
> > > >> > > > > > > >
> > > >> > > > > > > > On Mon, Mar 16, 2015 at 11:29 PM, Steven Wu <
> > > >> > > [email protected]>
> > > >> > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > I think I can answer my own question. delayed
> response
> > > >> will
> > > >> > > cause
> > > >> > > > > the
> > > >> > > > > > > > > producer buffer to be full, which then result in
> > either
> > > >> > thread
> > > >> > > > > > blocking
> > > >> > > > > > > > or
> > > >> > > > > > > > > message drop.
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Mon, Mar 16, 2015 at 11:24 PM, Steven Wu <
> > > >> > > [email protected]>
> > > >> > > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > >> please correct me if I am missing sth here. I am
> not
> > > >> > > understanding
> > > >> > > > > > how
> > > >> > > > > > > > >> would throttle work without cooperation/back-off
> from
> > > >> > > producer.
> > > >> > > > > new
> > > >> > > > > > > Java
> > > >> > > > > > > > >> producer supports non-blocking API. why would
> delayed
> > > >> > > response be
> > > >> > > > > > able
> > > >> > > > > > > > to
> > > >> > > > > > > > >> slow down producer? producer will continue to fire
> > > async
> > > >> > > sends.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> On Mon, Mar 16, 2015 at 10:58 PM, Guozhang Wang <
> > > >> > > > > [email protected]
> > > >> > > > > > >
> > > >> > > > > > > > >> wrote:
> > > >> > > > > > > > >>
> > > >> > > > > > > > >>> I think we are really discussing two separate
> issues
> > > >> here:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> 1. Whether we should a)
> > > >> > > > > append-then-block-then-returnOKButThrottled
> > > >> > > > > > > or
> > > >> > > > > > > > b)
> > > >> > > > > > > > >>> block-then-returnFailDuetoThrottled for quota
> > actions
> > > on
> > > >> > > produce
> > > >> > > > > > > > >>> requests.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Both these approaches assume some kind of
> > > >> well-behaveness
> > > >> > of
> > > >> > > the
> > > >> > > > > > > > clients:
> > > >> > > > > > > > >>> option a) assumes the client sets an proper
> timeout
> > > >> value
> > > >> > > while
> > > >> > > > > can
> > > >> > > > > > > > just
> > > >> > > > > > > > >>> ignore "OKButThrottled" response, while option b)
> > > >> assumes
> > > >> > the
> > > >> > > > > > client
> > > >> > > > > > > > >>> handles the "FailDuetoThrottled" appropriately.
> For
> > > any
> > > >> > > malicious
> > > >> > > > > > > > clients
> > > >> > > > > > > > >>> that, for example, just keep retrying either
> > > >> intentionally
> > > >> > or
> > > >> > > > > not,
> > > >> > > > > > > > >>> neither
> > > >> > > > > > > > >>> of these approaches are actually effective.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> 2. For "OKButThrottled" and "FailDuetoThrottled"
> > > >> responses,
> > > >> > > shall
> > > >> > > > > > we
> > > >> > > > > > > > >>> encode
> > > >> > > > > > > > >>> them as error codes or augment the protocol to
> use a
> > > >> > separate
> > > >> > > > > field
> > > >> > > > > > > > >>> indicating "status codes".
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Today we have already incorporated some status
> code
> > as
> > > >> > error
> > > >> > > > > codes
> > > >> > > > > > in
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> responses, e.g. ReplicaNotAvailable in
> > > MetadataResponse,
> > > >> > the
> > > >> > > pros
> > > >> > > > > > of
> > > >> > > > > > > > this
> > > >> > > > > > > > >>> is of course using a single field for response
> > status
> > > >> like
> > > >> > > the
> > > >> > > > > HTTP
> > > >> > > > > > > > >>> status
> > > >> > > > > > > > >>> codes, while the cons is that it requires clients
> to
> > > >> handle
> > > >> > > the
> > > >> > > > > > error
> > > >> > > > > > > > >>> codes
> > > >> > > > > > > > >>> carefully.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> I think maybe we can actually extend the
> single-code
> > > >> > > approach to
> > > >> > > > > > > > overcome
> > > >> > > > > > > > >>> its drawbacks, that is, wrap the error codes
> > semantics
> > > >> to
> > > >> > the
> > > >> > > > > users
> > > >> > > > > > > so
> > > >> > > > > > > > >>> that
> > > >> > > > > > > > >>> users do not need to handle the codes one-by-one.
> > More
> > > >> > > > > concretely,
> > > >> > > > > > > > >>> following Jay's example the client could write
> sth.
> > > like
> > > >> > > this:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> -----------------
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>   if(error.isOK())
> > > >> > > > > > > > >>>      // status code is good or the code can be
> > simply
> > > >> > > ignored for
> > > >> > > > > > > this
> > > >> > > > > > > > >>> request type, process the request
> > > >> > > > > > > > >>>   else if(error.needsRetry())
> > > >> > > > > > > > >>>      // throttled, transient error, etc: retry
> > > >> > > > > > > > >>>   else if(error.isFatal())
> > > >> > > > > > > > >>>      // non-retriable errors, etc: notify /
> > terminate
> > > /
> > > >> > other
> > > >> > > > > > > handling
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> -----------------
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Only when the clients really want to handle, for
> > > example
> > > >> > > > > > > > >>> FailDuetoThrottled
> > > >> > > > > > > > >>> status code specifically, it needs to:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>   if(error.isOK())
> > > >> > > > > > > > >>>      // status code is good or the code can be
> > simply
> > > >> > > ignored for
> > > >> > > > > > > this
> > > >> > > > > > > > >>> request type, process the request
> > > >> > > > > > > > >>>   else if(error == FailDuetoThrottled )
> > > >> > > > > > > > >>>      // throttled: log it
> > > >> > > > > > > > >>>   else if(error.needsRetry())
> > > >> > > > > > > > >>>      // transient error, etc: retry
> > > >> > > > > > > > >>>   else if(error.isFatal())
> > > >> > > > > > > > >>>      // non-retriable errors, etc: notify /
> > terminate
> > > /
> > > >> > other
> > > >> > > > > > > handling
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> -----------------
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> And for implementation we can probably group the
> > codes
> > > >> > > > > accordingly
> > > >> > > > > > > like
> > > >> > > > > > > > >>> HTTP status code such that we can do:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> boolean Error.isOK() {
> > > >> > > > > > > > >>>   return code < 300 && code >= 200;
> > > >> > > > > > > > >>> }
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Guozhang
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> On Mon, Mar 16, 2015 at 10:24 PM, Ewen
> > > Cheslack-Postava
> > > >> <
> > > >> > > > > > > > >>> [email protected]>
> > > >> > > > > > > > >>> wrote:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> > Agreed that trying to shoehorn non-error codes
> > into
> > > >> the
> > > >> > > error
> > > >> > > > > > field
> > > >> > > > > > > > is
> > > >> > > > > > > > >>> a
> > > >> > > > > > > > >>> > bad idea. It makes it *way* too easy to write
> code
> > > >> that
> > > >> > > looks
> > > >> > > > > > (and
> > > >> > > > > > > > >>> should
> > > >> > > > > > > > >>> > be) correct but is actually incorrect. If
> > > necessary, I
> > > >> > > think
> > > >> > > > > it's
> > > >> > > > > > > > much
> > > >> > > > > > > > >>> > better to to spend a couple of extra bytes to
> > encode
> > > >> that
> > > >> > > > > > > information
> > > >> > > > > > > > >>> > separately (a "status" or "warning" section of
> the
> > > >> > > response).
> > > >> > > > > An
> > > >> > > > > > > > >>> indication
> > > >> > > > > > > > >>> > that throttling is occurring is something I'd
> > expect
> > > >> to
> > > >> > be
> > > >> > > > > > > indicated
> > > >> > > > > > > > >>> by a
> > > >> > > > > > > > >>> > bit flag in the response rather than as an error
> > > code.
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>> > Gwen - I think an error code makes sense when
> the
> > > >> request
> > > >> > > > > > actually
> > > >> > > > > > > > >>> failed.
> > > >> > > > > > > > >>> > Option B, which Jun was advocating, would have
> > > >> appended
> > > >> > the
> > > >> > > > > > > messages
> > > >> > > > > > > > >>> > successfully. If the rate-limiting case you're
> > > talking
> > > >> > > about
> > > >> > > > > had
> > > >> > > > > > > > >>> > successfully committed the messages, I would say
> > > >> that's
> > > >> > > also a
> > > >> > > > > > bad
> > > >> > > > > > > > use
> > > >> > > > > > > > >>> of
> > > >> > > > > > > > >>> > error codes.
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>> > On Mon, Mar 16, 2015 at 10:16 PM, Gwen Shapira <
> > > >> > > > > > > > [email protected]>
> > > >> > > > > > > > >>> > wrote:
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>> > > We discussed an error code for rate-limiting
> > > (which
> > > >> I
> > > >> > > think
> > > >> > > > > > made
> > > >> > > > > > > > >>> > > sense), isn't it a similar case?
> > > >> > > > > > > > >>> > >
> > > >> > > > > > > > >>> > > On Mon, Mar 16, 2015 at 10:10 PM, Jay Kreps <
> > > >> > > > > > [email protected]
> > > >> > > > > > > >
> > > >> > > > > > > > >>> wrote:
> > > >> > > > > > > > >>> > > > My concern is that as soon as you start
> > encoding
> > > >> > > non-error
> > > >> > > > > > > > response
> > > >> > > > > > > > >>> > > > information into error codes the next
> question
> > > is
> > > >> > what
> > > >> > > to
> > > >> > > > > do
> > > >> > > > > > if
> > > >> > > > > > > > two
> > > >> > > > > > > > >>> > such
> > > >> > > > > > > > >>> > > > codes apply (i.e. you have a replica down
> and
> > > the
> > > >> > > response
> > > >> > > > > is
> > > >> > > > > > > > >>> > quota'd). I
> > > >> > > > > > > > >>> > > > think I am trying to argue that error should
> > > mean
> > > >> > "why
> > > >> > > we
> > > >> > > > > > > failed
> > > >> > > > > > > > >>> your
> > > >> > > > > > > > >>> > > > request", for which there will really only
> be
> > > one
> > > >> > > reason,
> > > >> > > > > and
> > > >> > > > > > > any
> > > >> > > > > > > > >>> other
> > > >> > > > > > > > >>> > > > useful information we want to send back is
> > just
> > > >> > another
> > > >> > > > > field
> > > >> > > > > > > in
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>> > > > response.
> > > >> > > > > > > > >>> > > >
> > > >> > > > > > > > >>> > > > -Jay
> > > >> > > > > > > > >>> > > >
> > > >> > > > > > > > >>> > > > On Mon, Mar 16, 2015 at 9:51 PM, Gwen
> Shapira
> > <
> > > >> > > > > > > > >>> [email protected]>
> > > >> > > > > > > > >>> > > wrote:
> > > >> > > > > > > > >>> > > >
> > > >> > > > > > > > >>> > > >> I think its not too late to reserve a set
> of
> > > >> error
> > > >> > > codes
> > > >> > > > > > > > >>> (200-299?)
> > > >> > > > > > > > >>> > > >> for "non-error" codes.
> > > >> > > > > > > > >>> > > >>
> > > >> > > > > > > > >>> > > >> It won't be backward compatible (i.e.
> clients
> > > >> that
> > > >> > > > > currently
> > > >> > > > > > > do
> > > >> > > > > > > > >>> "else
> > > >> > > > > > > > >>> > > >> throw" will throw on non-errors), but
> perhaps
> > > its
> > > >> > > > > > worthwhile.
> > > >> > > > > > > > >>> > > >>
> > > >> > > > > > > > >>> > > >> On Mon, Mar 16, 2015 at 9:42 PM, Jay Kreps
> <
> > > >> > > > > > > [email protected]
> > > >> > > > > > > > >
> > > >> > > > > > > > >>> > wrote:
> > > >> > > > > > > > >>> > > >> > Hey Jun,
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> > I'd really really really like to avoid
> > that.
> > > >> > Having
> > > >> > > just
> > > >> > > > > > > > spent a
> > > >> > > > > > > > >>> > > bunch of
> > > >> > > > > > > > >>> > > >> > time on the clients, using the error
> codes
> > to
> > > >> > encode
> > > >> > > > > other
> > > >> > > > > > > > >>> > information
> > > >> > > > > > > > >>> > > >> > about the response is super dangerous.
> The
> > > >> error
> > > >> > > > > handling
> > > >> > > > > > is
> > > >> > > > > > > > >>> one of
> > > >> > > > > > > > >>> > > the
> > > >> > > > > > > > >>> > > >> > hardest parts of the client (Guozhang
> chime
> > > in
> > > >> > > here).
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> > Generally the error handling looks like
> > > >> > > > > > > > >>> > > >> >   if(error == none)
> > > >> > > > > > > > >>> > > >> >      // good, process the request
> > > >> > > > > > > > >>> > > >> >   else if(error == KNOWN_ERROR_1)
> > > >> > > > > > > > >>> > > >> >      // handle known error 1
> > > >> > > > > > > > >>> > > >> >   else if(error == KNOWN_ERROR_2)
> > > >> > > > > > > > >>> > > >> >      // handle known error 2
> > > >> > > > > > > > >>> > > >> >   else
> > > >> > > > > > > > >>> > > >> >      throw
> > Errors.forCode(error).exception();
> > > >> //
> > > >> > or
> > > >> > > some
> > > >> > > > > > > other
> > > >> > > > > > > > >>> > default
> > > >> > > > > > > > >>> > > >> > behavior
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> > This works because we have a convention
> > that
> > > >> and
> > > >> > > error
> > > >> > > > > is
> > > >> > > > > > > > >>> something
> > > >> > > > > > > > >>> > > that
> > > >> > > > > > > > >>> > > >> > prevented your getting the response so
> the
> > > >> default
> > > >> > > > > > handling
> > > >> > > > > > > > >>> case is
> > > >> > > > > > > > >>> > > sane
> > > >> > > > > > > > >>> > > >> > and forward compatible. It is tempting to
> > use
> > > >> the
> > > >> > > error
> > > >> > > > > > code
> > > >> > > > > > > > to
> > > >> > > > > > > > >>> > convey
> > > >> > > > > > > > >>> > > >> > information in the success case. For
> > example
> > > we
> > > >> > > could
> > > >> > > > > use
> > > >> > > > > > > > error
> > > >> > > > > > > > >>> > codes
> > > >> > > > > > > > >>> > > to
> > > >> > > > > > > > >>> > > >> > encode whether quotas were enforced,
> > whether
> > > >> the
> > > >> > > request
> > > >> > > > > > was
> > > >> > > > > > > > >>> served
> > > >> > > > > > > > >>> > > out
> > > >> > > > > > > > >>> > > >> of
> > > >> > > > > > > > >>> > > >> > cache, whether the stock market is up
> > today,
> > > or
> > > >> > > > > whatever.
> > > >> > > > > > > The
> > > >> > > > > > > > >>> > problem
> > > >> > > > > > > > >>> > > is
> > > >> > > > > > > > >>> > > >> > that since these are not errors as far as
> > the
> > > >> > > client is
> > > >> > > > > > > > >>> concerned it
> > > >> > > > > > > > >>> > > >> should
> > > >> > > > > > > > >>> > > >> > not throw an exception but process the
> > > >> response,
> > > >> > > but now
> > > >> > > > > > we
> > > >> > > > > > > > >>> created
> > > >> > > > > > > > >>> > an
> > > >> > > > > > > > >>> > > >> > explicit requirement that that error be
> > > handled
> > > >> > > > > explicitly
> > > >> > > > > > > > >>> since it
> > > >> > > > > > > > >>> > is
> > > >> > > > > > > > >>> > > >> > different. I really think that this kind
> of
> > > >> > > information
> > > >> > > > > is
> > > >> > > > > > > not
> > > >> > > > > > > > >>> an
> > > >> > > > > > > > >>> > > error,
> > > >> > > > > > > > >>> > > >> it
> > > >> > > > > > > > >>> > > >> > is just information, and if we want it in
> > the
> > > >> > > response
> > > >> > > > > we
> > > >> > > > > > > > >>> should do
> > > >> > > > > > > > >>> > > the
> > > >> > > > > > > > >>> > > >> > right thing and add a new field to the
> > > >> response.
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> > I think you saw the Samza bug that was
> > > >> literally
> > > >> > an
> > > >> > > > > > example
> > > >> > > > > > > of
> > > >> > > > > > > > >>> this
> > > >> > > > > > > > >>> > > >> > happening and leading to an infinite
> retry
> > > >> loop.
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> > Further more I really want to emphasize
> > that
> > > >> > hitting
> > > >> > > > > your
> > > >> > > > > > > > quota
> > > >> > > > > > > > >>> in
> > > >> > > > > > > > >>> > the
> > > >> > > > > > > > >>> > > >> > design that Adi has proposed is actually
> > not
> > > an
> > > >> > > error
> > > >> > > > > > > > condition
> > > >> > > > > > > > >>> at
> > > >> > > > > > > > >>> > > all.
> > > >> > > > > > > > >>> > > >> It
> > > >> > > > > > > > >>> > > >> > is totally reasonable in any bootstrap
> > > >> situation
> > > >> > to
> > > >> > > > > > > > >>> intentionally
> > > >> > > > > > > > >>> > > want to
> > > >> > > > > > > > >>> > > >> > run at the limit the system imposes on
> you.
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> > -Jay
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> > On Mon, Mar 16, 2015 at 4:27 PM, Jun Rao
> <
> > > >> > > > > > [email protected]>
> > > >> > > > > > > > >>> wrote:
> > > >> > > > > > > > >>> > > >> >
> > > >> > > > > > > > >>> > > >> >> It's probably useful for a client to
> know
> > > >> whether
> > > >> > > its
> > > >> > > > > > > > requests
> > > >> > > > > > > > >>> are
> > > >> > > > > > > > >>> > > >> >> throttled or not (e.g., for monitoring
> and
> > > >> > > alerting).
> > > >> > > > > > From
> > > >> > > > > > > > that
> > > >> > > > > > > > >>> > > >> >> perspective, option B (delay the
> requests
> > > and
> > > >> > > return an
> > > >> > > > > > > > error)
> > > >> > > > > > > > >>> > seems
> > > >> > > > > > > > >>> > > >> >> better.
> > > >> > > > > > > > >>> > > >> >>
> > > >> > > > > > > > >>> > > >> >> Thanks,
> > > >> > > > > > > > >>> > > >> >>
> > > >> > > > > > > > >>> > > >> >> Jun
> > > >> > > > > > > > >>> > > >> >>
> > > >> > > > > > > > >>> > > >> >> On Wed, Mar 4, 2015 at 3:51 PM, Aditya
> > > >> Auradkar <
> > > >> > > > > > > > >>> > > >> >> [email protected]> wrote:
> > > >> > > > > > > > >>> > > >> >>
> > > >> > > > > > > > >>> > > >> >> > Posted a KIP for quotas in kafka.
> > > >> > > > > > > > >>> > > >> >> >
> > > >> > > > > > > > >>> >
> > > >> > > > > >
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > >> > > > > > > > >>> > > >> >> >
> > > >> > > > > > > > >>> > > >> >> > Appreciate any feedback.
> > > >> > > > > > > > >>> > > >> >> >
> > > >> > > > > > > > >>> > > >> >> > Aditya
> > > >> > > > > > > > >>> > > >> >> >
> > > >> > > > > > > > >>> > > >> >>
> > > >> > > > > > > > >>> > > >>
> > > >> > > > > > > > >>> > >
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>> > --
> > > >> > > > > > > > >>> > Thanks,
> > > >> > > > > > > > >>> > Ewen
> > > >> > > > > > > > >>> >
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> --
> > > >> > > > > > > > >>> -- Guozhang
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>
> > > >> > > > > > > > >>
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > Sent from Gmail Mobile
> > > >
> > >
> > >
> > > --
> > > Sent from Gmail Mobile
> > >
> >
>

RE: [KIP-DISCUSSION] KIP-13 Quotas

Reply via email to