Re: [DISCUSS] KIP-124: Request rate quotas

Jay Kreps Thu, 23 Feb 2017 08:43:53 -0800

A few minor comments:

   1. Isn't it the case that the throttling time response field should have
   the total time your request was throttled irrespective of the quotas that
   caused that. Limiting it to byte rate quota doesn't make sense, but I also
   I don't think we want to end up adding new fields in the response for every
   single thing we quota, right?
   2. I don't think we should make this quota specifically about io
   threads. Once we introduce these quotas people set them and expect them to
   be enforced (and if they aren't it may cause an outage). As a result they
   are a bit more sensitive than normal configs, I think. The current thread
   pools seem like something of an implementation detail and not the level the
   user-facing quotas should be involved with. I think it might be better to
   make this a general request-time throttle with no mention in the naming
   about I/O threads and simply acknowledge the current limitation (which we
   may someday fix) in the docs that this covers only the time after the
   thread is read off the network.
   3. As such I think the right interface to the user would be something
   like percent_request_time and be in {0,...100} or request_time_ratio and be
   in {0.0,...,1.0} (I think "ratio" is the terminology we used if the scale
   is between 0 and 1 in the other metrics, right?)


-Jay

On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram <rajinisiva...@gmail.com>
wrote:

> Guozhang/Dong,
>
> Thank you for the feedback.
>
> Guozhang : I have updated the section on co-existence of byte rate and
> request time quotas.
>
> Dong: I hadn't added much detail to the metrics and sensors since they are
> going to be very similar to the existing metrics and sensors. To avoid
> confusion, I have now added more detail. All metrics are in the group
> "quotaType" and all sensors have names starting with "quotaType" (where
> quotaType is Produce/Fetch/LeaderReplication/
> FollowerReplication/*IOThread*).
> So there will be no reuse of existing metrics/sensors. The new ones for
> request processing time based throttling will be completely independent of
> existing metrics/sensors, but will be consistent in format.
>
> The existing throttle_time_ms field in produce/fetch responses will not be
> impacted by this KIP. That will continue to return byte-rate based
> throttling times. In addition, a new field request_throttle_time_ms will be
> added to return request quota based throttling times. These will be exposed
> as new metrics on the client-side.
>
> Since all metrics and sensors are different for each type of quota, I
> believe there is already sufficient metrics to monitor throttling on both
> client and broker side for each type of throttling.
>
> Regards,
>
> Rajini
>
>
> On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hey Rajini,
> >
> > I think it makes a lot of sense to use io_thread_units as metric to quota
> > user's traffic here. LGTM overall. I have some questions regarding
> sensors.
> >
> > - Can you be more specific in the KIP what sensors will be added? For
> > example, it will be useful to specify the name and attributes of these
> new
> > sensors.
> >
> > - We currently have throttle-time and queue-size for byte-rate based
> quota.
> > Are you going to have separate throttle-time and queue-size for requests
> > throttled by io_thread_unit-based quota, or will they share the same
> > sensor?
> >
> > - Does the throttle-time in the ProduceResponse and FetchResponse
> contains
> > time due to io_thread_unit-based quota?
> >
> > - Currently kafka server doesn't not provide any log or metrics that
> tells
> > whether any given clientId (or user) is throttled. This is not too bad
> > because we can still check the client-side byte-rate metric to validate
> > whether a given client is throttled. But with this io_thread_unit, there
> > will be no way to validate whether a given client is slow because it has
> > exceeded its io_thread_unit limit. It is necessary for user to be able to
> > know this information to figure how whether they have reached there quota
> > limit. How about we add log4j log on the server side to periodically
> print
> > the (client_id, byte-rate-throttle-time, io-thread-unit-throttle-time) so
> > that kafka administrator can figure those users that have reached their
> > limit and act accordingly?
> >
> > Thanks,
> > Dong
> >
> >
> >
> >
> >
> > On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> > > Made a pass over the doc, overall LGTM except a minor comment on the
> > > throttling implementation:
> > >
> > > Stated as "Request processing time throttling will be applied on top if
> > > necessary." I thought that it meant the request processing time
> > throttling
> > > is applied first, but continue reading I found it actually meant to
> apply
> > > produce / fetch byte rate throttling first.
> > >
> > > Also the last sentence "The remaining delay if any is applied to the
> > > response." is a bit confusing to me. Maybe rewording it a bit?
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Rajini,
> > > >
> > > > Thanks for the updated KIP. The latest proposal looks good to me.
> > > >
> > > > Jun
> > > >
> > > > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Jun/Roger,
> > > > >
> > > > > Thank you for the feedback.
> > > > >
> > > > > 1. I have updated the KIP to use absolute units instead of
> > percentage.
> > > > The
> > > > > property is called* io_thread_units* to align with the thread count
> > > > > property *num.io.threads*. When we implement network thread
> > utilization
> > > > > quotas, we can add another property *network_thread_units.*
> > > > >
> > > > > 2. ControlledShutdown is already listed under the exempt requests.
> > Jun,
> > > > did
> > > > > you mean a different request that needs to be added? The four
> > requests
> > > > > currently exempt in the KIP are StopReplica, ControlledShutdown,
> > > > > LeaderAndIsr and UpdateMetadata. These are controlled using
> > > ClusterAction
> > > > > ACL, so it is easy to exclude and only throttle if unauthorized. I
> > > wasn't
> > > > > sure if there are other requests used only for inter-broker that
> > needed
> > > > to
> > > > > be excluded.
> > > > >
> > > > > 3. I was thinking the smallest change would be to replace all
> > > references
> > > > to
> > > > > *requestChannel.sendResponse()* with a local method
> > > > > *sendResponseMaybeThrottle()* that does the throttling if any plus
> > send
> > > > > response. If we throttle first in *KafkaApis.handle()*, the time
> > spent
> > > > > within the method handling the request will not be recorded or used
> > in
> > > > > throttling. We can look into this again when the PR is ready for
> > > review.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover <
> > roger.hoo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Great to see this KIP and the excellent discussion.
> > > > > >
> > > > > > To me, Jun's suggestion makes sense.  If my application is
> > allocated
> > > 1
> > > > > > request handler unit, then it's as if I have a Kafka broker with
> a
> > > > single
> > > > > > request handler thread dedicated to me.  That's the most I can
> use,
> > > at
> > > > > > least.  That allocation doesn't change even if an admin later
> > > increases
> > > > > the
> > > > > > size of the request thread pool on the broker.  It's similar to
> the
> > > CPU
> > > > > > abstraction that VMs and containers get from hypervisors or OS
> > > > > schedulers.
> > > > > > While different client access patterns can use wildly different
> > > amounts
> > > > > of
> > > > > > request thread resources per request, a given application will
> > > > generally
> > > > > > have a stable access pattern and can figure out empirically how
> > many
> > > > > > "request thread units" it needs to meet it's throughput/latency
> > > goals.
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Roger
> > > > > >
> > > > > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Rajini,
> > > > > > >
> > > > > > > Thanks for the updated KIP. A few more comments.
> > > > > > >
> > > > > > > 1. A concern of request_time_percent is that it's not an
> absolute
> > > > > value.
> > > > > > > Let's say you give a user a 10% limit. If the admin doubles the
> > > > number
> > > > > of
> > > > > > > request handler threads, that user now actually has twice the
> > > > absolute
> > > > > > > capacity. This may confuse people a bit. So, perhaps setting
> the
> > > > quota
> > > > > > > based on an absolute request thread unit is better.
> > > > > > >
> > > > > > > 2. ControlledShutdownRequest is also an inter-broker request
> and
> > > > needs
> > > > > to
> > > > > > > be excluded from throttling.
> > > > > > >
> > > > > > > 3. Implementation wise, I am wondering if it's simpler to apply
> > the
> > > > > > request
> > > > > > > time throttling first in KafkaApis.handle(). Otherwise, we will
> > > need
> > > > to
> > > > > > add
> > > > > > > the throttling logic in each type of request.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram <
> > > > > rajinisiva...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Jun,
> > > > > > > >
> > > > > > > > Thank you for the review.
> > > > > > > >
> > > > > > > > I have reverted to the original KIP that throttles based on
> > > request
> > > > > > > handler
> > > > > > > > utilization. At the moment, it uses percentage, but I am
> happy
> > to
> > > > > > change
> > > > > > > to
> > > > > > > > a fraction (out of 1 instead of 100) if required. I have
> added
> > > the
> > > > > > > examples
> > > > > > > > from this discussion to the KIP. Also added a "Future Work"
> > > section
> > > > > to
> > > > > > > > address network thread utilization. The configuration is
> named
> > > > > > > > "request_time_percent" with the expectation that it can also
> be
> > > > used
> > > > > as
> > > > > > > the
> > > > > > > > limit for network thread utilization when that is
> implemented,
> > so
> > > > > that
> > > > > > > > users have to set only one config for the two and not have to
> > > worry
> > > > > > about
> > > > > > > > the internal distribution of the work between the two thread
> > > pools
> > > > in
> > > > > > > > Kafka.
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Rajini
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao <j...@confluent.io>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Rajini,
> > > > > > > > >
> > > > > > > > > Thanks for the proposal.
> > > > > > > > >
> > > > > > > > > The benefit of using the request processing time over the
> > > request
> > > > > > rate
> > > > > > > is
> > > > > > > > > exactly what people have said. I will just expand that a
> bit.
> > > > > > Consider
> > > > > > > > the
> > > > > > > > > following case. The producer sends a produce request with a
> > > 10MB
> > > > > > > message
> > > > > > > > > but compressed to 100KB with gzip. The decompression of the
> > > > message
> > > > > > on
> > > > > > > > the
> > > > > > > > > broker could take 10-15 seconds, during which time, a
> request
> > > > > handler
> > > > > > > > > thread is completely blocked. In this case, neither the
> > byte-in
> > > > > quota
> > > > > > > nor
> > > > > > > > > the request rate quota may be effective in protecting the
> > > broker.
> > > > > > > > Consider
> > > > > > > > > another case. A consumer group starts with 10 instances and
> > > later
> > > > > on
> > > > > > > > > switches to 20 instances. The request rate will likely
> > double,
> > > > but
> > > > > > the
> > > > > > > > > actually load on the broker may not double since each fetch
> > > > request
> > > > > > > only
> > > > > > > > > contains half of the partitions. Request rate quota may not
> > be
> > > > easy
> > > > > > to
> > > > > > > > > configure in this case.
> > > > > > > > >
> > > > > > > > > What we really want is to be able to prevent a client from
> > > using
> > > > > too
> > > > > > > much
> > > > > > > > > of the server side resources. In this particular KIP, this
> > > > resource
> > > > > > is
> > > > > > > > the
> > > > > > > > > capacity of the request handler threads. I agree that it
> may
> > > not
> > > > be
> > > > > > > > > intuitive for the users to determine how to set the right
> > > limit.
> > > > > > > However,
> > > > > > > > > this is not completely new and has been done in the
> container
> > > > world
> > > > > > > > > already. For example, Linux cgroup (
> > https://access.redhat.com/
> > > > > > > > > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
> > > > > > > > > Resource_Management_Guide/sec-cpu.html) has the concept of
> > > > > > > > > cpu.cfs_quota_us,
> > > > > > > > > which specifies the total amount of time in microseconds
> for
> > > > which
> > > > > > all
> > > > > > > > > tasks in a cgroup can run during a one second period. We
> can
> > > > > > > potentially
> > > > > > > > > model the request handler threads in a similar way. For
> > > example,
> > > > > each
> > > > > > > > > request handler thread can be 1 request handler unit and
> the
> > > > admin
> > > > > > can
> > > > > > > > > configure a limit on how many units (say 0.01) a client can
> > > have.
> > > > > > > > >
> > > > > > > > > Regarding not throttling the internal broker to broker
> > > requests.
> > > > We
> > > > > > > could
> > > > > > > > > do that. Alternatively, we could just let the admin
> > configure a
> > > > > high
> > > > > > > > limit
> > > > > > > > > for the kafka user (it may not be able to do that easily
> > based
> > > on
> > > > > > > > clientId
> > > > > > > > > though).
> > > > > > > > >
> > > > > > > > > Ideally we want to be able to protect the utilization of
> the
> > > > > network
> > > > > > > > thread
> > > > > > > > > pool too. The difficult is mostly what Rajini said: (1) The
> > > > > mechanism
> > > > > > > for
> > > > > > > > > throttling the requests is through Purgatory and we will
> have
> > > to
> > > > > > think
> > > > > > > > > through how to integrate that into the network layer.  (2)
> In
> > > the
> > > > > > > network
> > > > > > > > > layer, currently we know the user, but not the clientId of
> > the
> > > > > > request.
> > > > > > > > So,
> > > > > > > > > it's a bit tricky to throttle based on clientId there.
> Plus,
> > > the
> > > > > > > byteOut
> > > > > > > > > quota can already protect the network thread utilization
> for
> > > > fetch
> > > > > > > > > requests. So, if we can't figure out this part right now,
> > just
> > > > > > focusing
> > > > > > > > on
> > > > > > > > > the request handling threads for this KIP is still a useful
> > > > > feature.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram <
> > > > > > > rajinisiva...@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thank you all for the feedback.
> > > > > > > > > >
> > > > > > > > > > Jay: I have removed exemption for consumer heartbeat etc.
> > > Agree
> > > > > > that
> > > > > > > > > > protecting the cluster is more important than protecting
> > > > > individual
> > > > > > > > apps.
> > > > > > > > > > Have retained the exemption for StopReplicat/LeaderAndIsr
> > > etc,
> > > > > > these
> > > > > > > > are
> > > > > > > > > > throttled only if authorization fails (so can't be used
> for
> > > DoS
> > > > > > > attacks
> > > > > > > > > in
> > > > > > > > > > a secure cluster, but allows inter-broker requests to
> > > complete
> > > > > > > without
> > > > > > > > > > delays).
> > > > > > > > > >
> > > > > > > > > > I will wait another day to see if these is any objection
> to
> > > > > quotas
> > > > > > > > based
> > > > > > > > > on
> > > > > > > > > > request processing time (as opposed to request rate) and
> if
> > > > there
> > > > > > are
> > > > > > > > no
> > > > > > > > > > objections, I will revert to the original proposal with
> > some
> > > > > > changes.
> > > > > > > > > >
> > > > > > > > > > The original proposal was only including the time used by
> > the
> > > > > > request
> > > > > > > > > > handler threads (that made calculation easy). I think the
> > > > > > suggestion
> > > > > > > is
> > > > > > > > > to
> > > > > > > > > > include the time spent in the network threads as well
> since
> > > > that
> > > > > > may
> > > > > > > be
> > > > > > > > > > significant. As Jay pointed out, it is more complicated
> to
> > > > > > calculate
> > > > > > > > the
> > > > > > > > > > total available CPU time and convert to a ratio when
> there
> > > *m*
> > > > > I/O
> > > > > > > > > threads
> > > > > > > > > > and *n* network threads. ThreadMXBean#getThreadCPUTime()
> > may
> > > > > give
> > > > > > us
> > > > > > > > > what
> > > > > > > > > > we want, but it can be very expensive on some platforms.
> As
> > > > > Becket
> > > > > > > and
> > > > > > > > > > Guozhang have pointed out, we do have several time
> > > measurements
> > > > > > > already
> > > > > > > > > for
> > > > > > > > > > generating metrics that we could use, though we might
> want
> > to
> > > > > > switch
> > > > > > > to
> > > > > > > > > > nanoTime() instead of currentTimeMillis() since some of
> the
> > > > > values
> > > > > > > for
> > > > > > > > > > small requests may be < 1ms. But rather than add up the
> > time
> > > > > spent
> > > > > > in
> > > > > > > > I/O
> > > > > > > > > > thread and network thread, wouldn't it be better to
> convert
> > > the
> > > > > > time
> > > > > > > > > spent
> > > > > > > > > > on each thread into a separate ratio? UserA has a request
> > > quota
> > > > > of
> > > > > > > 5%.
> > > > > > > > > Can
> > > > > > > > > > we take that to mean that UserA can use 5% of the time on
> > > > network
> > > > > > > > threads
> > > > > > > > > > and 5% of the time on I/O threads? If either is exceeded,
> > the
> > > > > > > response
> > > > > > > > is
> > > > > > > > > > throttled - it would mean maintaining two sets of metrics
> > for
> > > > the
> > > > > > two
> > > > > > > > > > durations, but would result in more meaningful ratios. We
> > > could
> > > > > > > define
> > > > > > > > > two
> > > > > > > > > > quota limits (UserA has 5% of request threads and 10% of
> > > > network
> > > > > > > > > threads),
> > > > > > > > > > but that seems unnecessary and harder to explain to
> users.
> > > > > > > > > >
> > > > > > > > > > Back to why and how quotas are applied to network thread
> > > > > > utilization:
> > > > > > > > > > a) In the case of fetch,  the time spent in the network
> > > thread
> > > > > may
> > > > > > be
> > > > > > > > > > significant and I can see the need to include this. Are
> > there
> > > > > other
> > > > > > > > > > requests where the network thread utilization is
> > significant?
> > > > In
> > > > > > the
> > > > > > > > case
> > > > > > > > > > of fetch, request handler thread utilization would
> throttle
> > > > > clients
> > > > > > > > with
> > > > > > > > > > high request rate, low data volume and fetch byte rate
> > quota
> > > > will
> > > > > > > > > throttle
> > > > > > > > > > clients with high data volume. Network thread utilization
> > is
> > > > > > perhaps
> > > > > > > > > > proportional to the data volume. I am wondering if we
> even
> > > need
> > > > > to
> > > > > > > > > throttle
> > > > > > > > > > based on network thread utilization or whether the data
> > > volume
> > > > > > quota
> > > > > > > > > covers
> > > > > > > > > > this case.
> > > > > > > > > >
> > > > > > > > > > b) At the moment, we record and check for quota violation
> > at
> > > > the
> > > > > > same
> > > > > > > > > time.
> > > > > > > > > > If a quota is violated, the response is delayed. Using
> > Jay'e
> > > > > > example
> > > > > > > of
> > > > > > > > > > disk reads for fetches happening in the network thread,
> We
> > > > can't
> > > > > > > record
> > > > > > > > > and
> > > > > > > > > > delay a response after the disk reads. We could record
> the
> > > time
> > > > > > spent
> > > > > > > > on
> > > > > > > > > > the network thread when the response is complete and
> > > introduce
> > > > a
> > > > > > > delay
> > > > > > > > > for
> > > > > > > > > > handling a subsequent request (separate out recording and
> > > quota
> > > > > > > > violation
> > > > > > > > > > handling in the case of network thread overload). Does
> that
> > > > make
> > > > > > > sense?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > Rajini
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin <
> > > > > becket....@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey Jay,
> > > > > > > > > > >
> > > > > > > > > > > Yeah, I agree that enforcing the CPU time is a little
> > > > tricky. I
> > > > > > am
> > > > > > > > > > thinking
> > > > > > > > > > > that maybe we can use the existing request statistics.
> > They
> > > > are
> > > > > > > > already
> > > > > > > > > > > very detailed so we can probably see the approximate
> CPU
> > > time
> > > > > > from
> > > > > > > > it,
> > > > > > > > > > e.g.
> > > > > > > > > > > something like (total_time -
> request/response_queue_time
> > -
> > > > > > > > > remote_time).
> > > > > > > > > > >
> > > > > > > > > > > I agree with Guozhang that when a user is throttled it
> is
> > > > > likely
> > > > > > > that
> > > > > > > > > we
> > > > > > > > > > > need to see if anything has went wrong first, and if
> the
> > > > users
> > > > > > are
> > > > > > > > well
> > > > > > > > > > > behaving and just need more resources, we will have to
> > bump
> > > > up
> > > > > > the
> > > > > > > > > quota
> > > > > > > > > > > for them. It is true that pre-allocating CPU time quota
> > > > > precisely
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > users is difficult. So in practice it would probably be
> > > more
> > > > > like
> > > > > > > > first
> > > > > > > > > > set
> > > > > > > > > > > a relative high protective CPU time quota for everyone
> > and
> > > > > > increase
> > > > > > > > > that
> > > > > > > > > > > for some individual clients on demand.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang <
> > > > > > wangg...@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > This is a great proposal, glad to see it happening.
> > > > > > > > > > > >
> > > > > > > > > > > > I am inclined to the CPU throttling, or more
> > specifically
> > > > > > > > processing
> > > > > > > > > > time
> > > > > > > > > > > > ratio instead of the request rate throttling as well.
> > > > Becket
> > > > > > has
> > > > > > > > very
> > > > > > > > > > > well
> > > > > > > > > > > > summed my rationales above, and one thing to add here
> > is
> > > > that
> > > > > > the
> > > > > > > > > > former
> > > > > > > > > > > > has a good support for both "protecting against rogue
> > > > > clients"
> > > > > > as
> > > > > > > > > well
> > > > > > > > > > as
> > > > > > > > > > > > "utilizing a cluster for multi-tenancy usage": when
> > > > thinking
> > > > > > > about
> > > > > > > > > how
> > > > > > > > > > to
> > > > > > > > > > > > explain this to the end users, I find it actually
> more
> > > > > natural
> > > > > > > than
> > > > > > > > > the
> > > > > > > > > > > > request rate since as mentioned above, different
> > requests
> > > > > will
> > > > > > > have
> > > > > > > > > > quite
> > > > > > > > > > > > different "cost", and Kafka today already have
> various
> > > > > request
> > > > > > > > types
> > > > > > > > > > > > (produce, fetch, admin, metadata, etc), because of
> that
> > > the
> > > > > > > request
> > > > > > > > > > rate
> > > > > > > > > > > > throttling may not be as effective unless it is set
> > very
> > > > > > > > > > conservatively.
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding to user reactions when they are throttled,
> I
> > > > think
> > > > > it
> > > > > > > may
> > > > > > > > > > > differ
> > > > > > > > > > > > case-by-case, and need to be discovered / guided by
> > > looking
> > > > > at
> > > > > > > > > relative
> > > > > > > > > > > > metrics. So in other words users would not expect to
> > get
> > > > > > > additional
> > > > > > > > > > > > information by simply being told "hey, you are
> > > throttled",
> > > > > > which
> > > > > > > is
> > > > > > > > > all
> > > > > > > > > > > > what throttling does; they need to take a follow-up
> > step
> > > > and
> > > > > > see
> > > > > > > > > "hmm,
> > > > > > > > > > > I'm
> > > > > > > > > > > > throttled probably because of ..", which is by
> looking
> > at
> > > > > other
> > > > > > > > > metric
> > > > > > > > > > > > values: e.g. whether I'm bombarding the brokers with
> > > > metadata
> > > > > > > > > request,
> > > > > > > > > > > > which are usually cheap to handle but I'm sending
> > > thousands
> > > > > per
> > > > > > > > > second;
> > > > > > > > > > > or
> > > > > > > > > > > > is it because I'm catching up and hence sending very
> > > heavy
> > > > > > > fetching
> > > > > > > > > > > request
> > > > > > > > > > > > with large min.bytes, etc.
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding to the implementation, as once discussed
> with
> > > > Jun,
> > > > > > this
> > > > > > > > > seems
> > > > > > > > > > > not
> > > > > > > > > > > > very difficult since today we are already collecting
> > the
> > > > > > "thread
> > > > > > > > pool
> > > > > > > > > > > > utilization" metrics, which is a single percentage
> > > > > > > > > "aggregateIdleMeter"
> > > > > > > > > > > > value; but we are already effectively aggregating it
> > for
> > > > each
> > > > > > > > > requests
> > > > > > > > > > in
> > > > > > > > > > > > KafkaRequestHandler, and we can just extend it by
> > > recording
> > > > > the
> > > > > > > > > source
> > > > > > > > > > > > client id when handling them and aggregating by
> > clientId
> > > as
> > > > > > well
> > > > > > > as
> > > > > > > > > the
> > > > > > > > > > > > total aggregate.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Guozhang
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps <
> > > > j...@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hey Becket/Rajini,
> > > > > > > > > > > > >
> > > > > > > > > > > > > When I thought about it more deeply I came around
> to
> > > the
> > > > > > > "percent
> > > > > > > > > of
> > > > > > > > > > > > > processing time" metric too. It seems a lot closer
> to
> > > the
> > > > > > thing
> > > > > > > > we
> > > > > > > > > > > > actually
> > > > > > > > > > > > > care about and need to protect. I also think this
> > would
> > > > be
> > > > > a
> > > > > > > very
> > > > > > > > > > > useful
> > > > > > > > > > > > > metric even in the absence of throttling just to
> > debug
> > > > > whose
> > > > > > > > using
> > > > > > > > > > > > > capacity.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Two problems to consider:
> > > > > > > > > > > > >
> > > > > > > > > > > > >    1. I agree that for the user it is
> understandable
> > > what
> > > > > > lead
> > > > > > > to
> > > > > > > > > > their
> > > > > > > > > > > > >    being throttled, but it is a bit hard to figure
> > out
> > > > the
> > > > > > safe
> > > > > > > > > range
> > > > > > > > > > > for
> > > > > > > > > > > > >    them. i.e. if I have a new app that will send
> 200
> > > > > > > > messages/sec I
> > > > > > > > > > can
> > > > > > > > > > > > >    probably reason that I'll be under the
> throttling
> > > > limit
> > > > > of
> > > > > > > 300
> > > > > > > > > > > > req/sec.
> > > > > > > > > > > > >    However if I need to be under a 10% CPU
> resources
> > > > limit
> > > > > it
> > > > > > > may
> > > > > > > > > be
> > > > > > > > > > a
> > > > > > > > > > > > bit
> > > > > > > > > > > > >    harder for me to know a priori if i will or
> won't.
> > > > > > > > > > > > >    2. Calculating the available CPU time is a bit
> > > > difficult
> > > > > > > since
> > > > > > > > > > there
> > > > > > > > > > > > are
> > > > > > > > > > > > >    actually two thread pools--the I/O threads and
> the
> > > > > network
> > > > > > > > > > threads.
> > > > > > > > > > > I
> > > > > > > > > > > > > think
> > > > > > > > > > > > >    it might be workable to count just the I/O
> thread
> > > time
> > > > > as
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > > proposal,
> > > > > > > > > > > > >    but the network thread work is actually
> > non-trivial
> > > > > (e.g.
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > > disk
> > > > > > > > > > > > >    reads for fetches happen in that thread). If you
> > > count
> > > > > > both
> > > > > > > > the
> > > > > > > > > > > > network
> > > > > > > > > > > > > and
> > > > > > > > > > > > >    I/O threads it can skew things a bit. E.g. say
> you
> > > > have
> > > > > 50
> > > > > > > > > network
> > > > > > > > > > > > > threads,
> > > > > > > > > > > > >    10 I/O threads, and 8 cores, what is the
> available
> > > cpu
> > > > > > time
> > > > > > > > > > > available
> > > > > > > > > > > > > in a
> > > > > > > > > > > > >    second? I suppose this is a problem whenever you
> > > have
> > > > a
> > > > > > > > > bottleneck
> > > > > > > > > > > > > between
> > > > > > > > > > > > >    I/O and network threads or if you end up
> > > significantly
> > > > > > > > > > > > over-provisioning
> > > > > > > > > > > > >    one pool (both of which are hard to avoid).
> > > > > > > > > > > > >
> > > > > > > > > > > > > An alternative for CPU throttling would be to use
> > this
> > > > api:
> > > > > > > > > > > > > http://docs.oracle.com/javase/
> > > 1.5.0/docs/api/java/lang/
> > > > > > > > > > > > > management/ThreadMXBean.html#
> getThreadCpuTime(long)
> > > > > > > > > > > > >
> > > > > > > > > > > > > That would let you track actual CPU usage across
> the
> > > > > network,
> > > > > > > I/O
> > > > > > > > > > > > threads,
> > > > > > > > > > > > > and purgatory threads and look at it as a
> percentage
> > of
> > > > > total
> > > > > > > > > cores.
> > > > > > > > > > I
> > > > > > > > > > > > > think this fixes many problems in the reliability
> of
> > > the
> > > > > > > metric.
> > > > > > > > > It's
> > > > > > > > > > > > > meaning is slightly different as it is just CPU
> (you
> > > > don't
> > > > > > get
> > > > > > > > > > charged
> > > > > > > > > > > > for
> > > > > > > > > > > > > time blocking on I/O) but that may be okay because
> we
> > > > > already
> > > > > > > > have
> > > > > > > > > a
> > > > > > > > > > > > > throttle on I/O. The downside is I think it is
> > possible
> > > > > this
> > > > > > > api
> > > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > > disabled or isn't always available and it may also
> be
> > > > > > expensive
> > > > > > > > > (also
> > > > > > > > > > > > I've
> > > > > > > > > > > > > never used it so not sure if it really works the
> way
> > i
> > > > > > think).
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Jay
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin <
> > > > > > > > becket....@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > If the purpose of the KIP is only to protect the
> > > > cluster
> > > > > > from
> > > > > > > > > being
> > > > > > > > > > > > > > overwhelmed by crazy clients and is not intended
> to
> > > > > address
> > > > > > > > > > resource
> > > > > > > > > > > > > > allocation problem among the clients, I am
> > wondering
> > > if
> > > > > > using
> > > > > > > > > > request
> > > > > > > > > > > > > > handling time quota (CPU time quota) is a better
> > > > option.
> > > > > > Here
> > > > > > > > are
> > > > > > > > > > the
> > > > > > > > > > > > > > reasons:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. request handling time quota has better
> > protection.
> > > > Say
> > > > > > we
> > > > > > > > have
> > > > > > > > > > > > request
> > > > > > > > > > > > > > rate quota and set that to some value like 100
> > > > > > requests/sec,
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > > > possible
> > > > > > > > > > > > > > that some of the requests are very expensive
> > actually
> > > > > take
> > > > > > a
> > > > > > > > lot
> > > > > > > > > of
> > > > > > > > > > > > time
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > handle. In that case a few clients may still
> > occupy a
> > > > lot
> > > > > > of
> > > > > > > > CPU
> > > > > > > > > > time
> > > > > > > > > > > > > even
> > > > > > > > > > > > > > the request rate is low. Arguably we can
> carefully
> > > set
> > > > > > > request
> > > > > > > > > rate
> > > > > > > > > > > > quota
> > > > > > > > > > > > > > for each request and client id combination, but
> it
> > > > could
> > > > > > > still
> > > > > > > > be
> > > > > > > > > > > > tricky
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > get it right for everyone.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If we use the request time handling quota, we can
> > > > simply
> > > > > > say
> > > > > > > no
> > > > > > > > > > > clients
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > take up to more than 30% of the total request
> > > handling
> > > > > > > capacity
> > > > > > > > > > > > (measured
> > > > > > > > > > > > > > by time), regardless of the difference among
> > > different
> > > > > > > requests
> > > > > > > > > or
> > > > > > > > > > > what
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > the client doing. In this case maybe we can quota
> > all
> > > > the
> > > > > > > > > requests
> > > > > > > > > > if
> > > > > > > > > > > > we
> > > > > > > > > > > > > > want to.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. The main benefit of using request rate limit
> is
> > > that
> > > > > it
> > > > > > > > seems
> > > > > > > > > > more
> > > > > > > > > > > > > > intuitive. It is true that it is probably easier
> to
> > > > > explain
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > user
> > > > > > > > > > > > > > what does that mean. However, in practice it
> looks
> > > the
> > > > > > impact
> > > > > > > > of
> > > > > > > > > > > > request
> > > > > > > > > > > > > > rate quota is not more quantifiable than the
> > request
> > > > > > handling
> > > > > > > > > time
> > > > > > > > > > > > quota.
> > > > > > > > > > > > > > Unlike the byte rate quota, it is still difficult
> > to
> > > > > give a
> > > > > > > > > number
> > > > > > > > > > > > about
> > > > > > > > > > > > > > impact of throughput or latency when a request
> rate
> > > > quota
> > > > > > is
> > > > > > > > hit.
> > > > > > > > > > So
> > > > > > > > > > > it
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > not better than the request handling time quota.
> In
> > > > fact
> > > > > I
> > > > > > > feel
> > > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > > > > clearer to tell user that "you are limited
> because
> > > you
> > > > > have
> > > > > > > > taken
> > > > > > > > > > 30%
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the CPU time on the broker" than otherwise
> > something
> > > > like
> > > > > > > "your
> > > > > > > > > > > request
> > > > > > > > > > > > > > rate quota on metadata request has reached".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps <
> > > > > > j...@confluent.io
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think this proposal makes a lot of sense
> > > > (especially
> > > > > > now
> > > > > > > > that
> > > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > > > > oriented around request rate) and fills the
> > biggest
> > > > > > > remaining
> > > > > > > > > gap
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > multi-tenancy story.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think for intra-cluster communication
> > > (StopReplica,
> > > > > > etc)
> > > > > > > we
> > > > > > > > > > could
> > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > throttling entirely. You can secure or
> otherwise
> > > > > > lock-down
> > > > > > > > the
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > communication to avoid any unauthorized
> external
> > > > party
> > > > > > from
> > > > > > > > > > trying
> > > > > > > > > > > to
> > > > > > > > > > > > > > > initiate these requests. As a result we are as
> > > likely
> > > > > to
> > > > > > > > cause
> > > > > > > > > > > > problems
> > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > solve them by throttling these, right?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm not so sure that we should exempt the
> > consumer
> > > > > > requests
> > > > > > > > > such
> > > > > > > > > > as
> > > > > > > > > > > > > > > heartbeat. It's true that if we throttle an
> app's
> > > > > > heartbeat
> > > > > > > > > > > requests
> > > > > > > > > > > > it
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > cause it to fall out of its consumer group.
> > However
> > > > if
> > > > > we
> > > > > > > > don't
> > > > > > > > > > > > > throttle
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > it may DDOS the cluster if the heartbeat
> interval
> > > is
> > > > > set
> > > > > > > > > > > incorrectly
> > > > > > > > > > > > or
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > some client in some language has a bug. I think
> > the
> > > > > > policy
> > > > > > > > with
> > > > > > > > > > > this
> > > > > > > > > > > > > kind
> > > > > > > > > > > > > > > of throttling is to protect the cluster above
> any
> > > > > > > individual
> > > > > > > > > app,
> > > > > > > > > > > > > right?
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > think in general this should be okay since for
> > most
> > > > > > > > deployments
> > > > > > > > > > > this
> > > > > > > > > > > > > > > setting is meant as more of a safety
> valve---that
> > > is
> > > > > > rather
> > > > > > > > > than
> > > > > > > > > > > set
> > > > > > > > > > > > > > > something very close to what you expect to need
> > > (say
> > > > 2
> > > > > > > > req/sec
> > > > > > > > > or
> > > > > > > > > > > > > > whatever)
> > > > > > > > > > > > > > > you would have something quite high (like 100
> > > > req/sec)
> > > > > > with
> > > > > > > > > this
> > > > > > > > > > > > meant
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > prevent a client gone crazy. I think when used
> > this
> > > > way
> > > > > > > > > allowing
> > > > > > > > > > > > those
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > be throttled would actually provide meaningful
> > > > > > protection.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -Jay
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini
> Sivaram <
> > > > > > > > > > > > > rajinisiva...@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have just created KIP-124 to introduce
> > request
> > > > rate
> > > > > > > > quotas
> > > > > > > > > to
> > > > > > > > > > > > > Kafka:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > > > > > > > > 124+-+Request+rate+quotas
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The proposal is for a simple percentage
> request
> > > > > > handling
> > > > > > > > time
> > > > > > > > > > > quota
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > can be allocated to *<client-id>*, *<user>*
> or
> > > > > *<user,
> > > > > > > > > > > client-id>*.
> > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > are a few other suggestions also under
> > "Rejected
> > > > > > > > > alternatives".
> > > > > > > > > > > > > > Feedback
> > > > > > > > > > > > > > > > and suggestions are welcome.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Rajini
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>

Re: [DISCUSS] KIP-124: Request rate quotas

Reply via email to