Re: [DISCUSS] KIP-124: Request rate quotas

Ismael Juma Thu, 23 Feb 2017 09:08:37 -0800

Hi Jay,

Regarding 1, I definitely like the simplicity of keeping a single throttle
time field in the response. The downside is that the client metrics will be
more coarse grained.


Regarding 3, we have `leader.imbalance.per.broker.percentage` and
`log.cleaner.min.cleanable.ratio`.

Ismael

On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps <j...@confluent.io> wrote:

> A few minor comments:
>
>    1. Isn't it the case that the throttling time response field should have
>    the total time your request was throttled irrespective of the quotas
> that
>    caused that. Limiting it to byte rate quota doesn't make sense, but I
> also
>    I don't think we want to end up adding new fields in the response for
> every
>    single thing we quota, right?
>    2. I don't think we should make this quota specifically about io
>    threads. Once we introduce these quotas people set them and expect them
> to
>    be enforced (and if they aren't it may cause an outage). As a result
> they
>    are a bit more sensitive than normal configs, I think. The current
> thread
>    pools seem like something of an implementation detail and not the level
> the
>    user-facing quotas should be involved with. I think it might be better
> to
>    make this a general request-time throttle with no mention in the naming
>    about I/O threads and simply acknowledge the current limitation (which
> we
>    may someday fix) in the docs that this covers only the time after the
>    thread is read off the network.
>    3. As such I think the right interface to the user would be something
>    like percent_request_time and be in {0,...100} or request_time_ratio
> and be
>    in {0.0,...,1.0} (I think "ratio" is the terminology we used if the
> scale
>    is between 0 and 1 in the other metrics, right?)
>
> -Jay
>
> On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
> > Guozhang/Dong,
> >
> > Thank you for the feedback.
> >
> > Guozhang : I have updated the section on co-existence of byte rate and
> > request time quotas.
> >
> > Dong: I hadn't added much detail to the metrics and sensors since they
> are
> > going to be very similar to the existing metrics and sensors. To avoid
> > confusion, I have now added more detail. All metrics are in the group
> > "quotaType" and all sensors have names starting with "quotaType" (where
> > quotaType is Produce/Fetch/LeaderReplication/
> > FollowerReplication/*IOThread*).
> > So there will be no reuse of existing metrics/sensors. The new ones for
> > request processing time based throttling will be completely independent
> of
> > existing metrics/sensors, but will be consistent in format.
> >
> > The existing throttle_time_ms field in produce/fetch responses will not
> be
> > impacted by this KIP. That will continue to return byte-rate based
> > throttling times. In addition, a new field request_throttle_time_ms will
> be
> > added to return request quota based throttling times. These will be
> exposed
> > as new metrics on the client-side.
> >
> > Since all metrics and sensors are different for each type of quota, I
> > believe there is already sufficient metrics to monitor throttling on both
> > client and broker side for each type of throttling.
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin <lindon...@gmail.com> wrote:
> >
> > > Hey Rajini,
> > >
> > > I think it makes a lot of sense to use io_thread_units as metric to
> quota
> > > user's traffic here. LGTM overall. I have some questions regarding
> > sensors.
> > >
> > > - Can you be more specific in the KIP what sensors will be added? For
> > > example, it will be useful to specify the name and attributes of these
> > new
> > > sensors.
> > >
> > > - We currently have throttle-time and queue-size for byte-rate based
> > quota.
> > > Are you going to have separate throttle-time and queue-size for
> requests
> > > throttled by io_thread_unit-based quota, or will they share the same
> > > sensor?
> > >
> > > - Does the throttle-time in the ProduceResponse and FetchResponse
> > contains
> > > time due to io_thread_unit-based quota?
> > >
> > > - Currently kafka server doesn't not provide any log or metrics that
> > tells
> > > whether any given clientId (or user) is throttled. This is not too bad
> > > because we can still check the client-side byte-rate metric to validate
> > > whether a given client is throttled. But with this io_thread_unit,
> there
> > > will be no way to validate whether a given client is slow because it
> has
> > > exceeded its io_thread_unit limit. It is necessary for user to be able
> to
> > > know this information to figure how whether they have reached there
> quota
> > > limit. How about we add log4j log on the server side to periodically
> > print
> > > the (client_id, byte-rate-throttle-time, io-thread-unit-throttle-time)
> so
> > > that kafka administrator can figure those users that have reached their
> > > limit and act accordingly?
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > > > Made a pass over the doc, overall LGTM except a minor comment on the
> > > > throttling implementation:
> > > >
> > > > Stated as "Request processing time throttling will be applied on top
> if
> > > > necessary." I thought that it meant the request processing time
> > > throttling
> > > > is applied first, but continue reading I found it actually meant to
> > apply
> > > > produce / fetch byte rate throttling first.
> > > >
> > > > Also the last sentence "The remaining delay if any is applied to the
> > > > response." is a bit confusing to me. Maybe rewording it a bit?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Hi, Rajini,
> > > > >
> > > > > Thanks for the updated KIP. The latest proposal looks good to me.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram <
> > > rajinisiva...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Jun/Roger,
> > > > > >
> > > > > > Thank you for the feedback.
> > > > > >
> > > > > > 1. I have updated the KIP to use absolute units instead of
> > > percentage.
> > > > > The
> > > > > > property is called* io_thread_units* to align with the thread
> count
> > > > > > property *num.io.threads*. When we implement network thread
> > > utilization
> > > > > > quotas, we can add another property *network_thread_units.*
> > > > > >
> > > > > > 2. ControlledShutdown is already listed under the exempt
> requests.
> > > Jun,
> > > > > did
> > > > > > you mean a different request that needs to be added? The four
> > > requests
> > > > > > currently exempt in the KIP are StopReplica, ControlledShutdown,
> > > > > > LeaderAndIsr and UpdateMetadata. These are controlled using
> > > > ClusterAction
> > > > > > ACL, so it is easy to exclude and only throttle if unauthorized.
> I
> > > > wasn't
> > > > > > sure if there are other requests used only for inter-broker that
> > > needed
> > > > > to
> > > > > > be excluded.
> > > > > >
> > > > > > 3. I was thinking the smallest change would be to replace all
> > > > references
> > > > > to
> > > > > > *requestChannel.sendResponse()* with a local method
> > > > > > *sendResponseMaybeThrottle()* that does the throttling if any
> plus
> > > send
> > > > > > response. If we throttle first in *KafkaApis.handle()*, the time
> > > spent
> > > > > > within the method handling the request will not be recorded or
> used
> > > in
> > > > > > throttling. We can look into this again when the PR is ready for
> > > > review.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover <
> > > roger.hoo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Great to see this KIP and the excellent discussion.
> > > > > > >
> > > > > > > To me, Jun's suggestion makes sense.  If my application is
> > > allocated
> > > > 1
> > > > > > > request handler unit, then it's as if I have a Kafka broker
> with
> > a
> > > > > single
> > > > > > > request handler thread dedicated to me.  That's the most I can
> > use,
> > > > at
> > > > > > > least.  That allocation doesn't change even if an admin later
> > > > increases
> > > > > > the
> > > > > > > size of the request thread pool on the broker.  It's similar to
> > the
> > > > CPU
> > > > > > > abstraction that VMs and containers get from hypervisors or OS
> > > > > > schedulers.
> > > > > > > While different client access patterns can use wildly different
> > > > amounts
> > > > > > of
> > > > > > > request thread resources per request, a given application will
> > > > > generally
> > > > > > > have a stable access pattern and can figure out empirically how
> > > many
> > > > > > > "request thread units" it needs to meet it's throughput/latency
> > > > goals.
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Roger
> > > > > > >
> > > > > > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao <j...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Rajini,
> > > > > > > >
> > > > > > > > Thanks for the updated KIP. A few more comments.
> > > > > > > >
> > > > > > > > 1. A concern of request_time_percent is that it's not an
> > absolute
> > > > > > value.
> > > > > > > > Let's say you give a user a 10% limit. If the admin doubles
> the
> > > > > number
> > > > > > of
> > > > > > > > request handler threads, that user now actually has twice the
> > > > > absolute
> > > > > > > > capacity. This may confuse people a bit. So, perhaps setting
> > the
> > > > > quota
> > > > > > > > based on an absolute request thread unit is better.
> > > > > > > >
> > > > > > > > 2. ControlledShutdownRequest is also an inter-broker request
> > and
> > > > > needs
> > > > > > to
> > > > > > > > be excluded from throttling.
> > > > > > > >
> > > > > > > > 3. Implementation wise, I am wondering if it's simpler to
> apply
> > > the
> > > > > > > request
> > > > > > > > time throttling first in KafkaApis.handle(). Otherwise, we
> will
> > > > need
> > > > > to
> > > > > > > add
> > > > > > > > the throttling logic in each type of request.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram <
> > > > > > rajinisiva...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Jun,
> > > > > > > > >
> > > > > > > > > Thank you for the review.
> > > > > > > > >
> > > > > > > > > I have reverted to the original KIP that throttles based on
> > > > request
> > > > > > > > handler
> > > > > > > > > utilization. At the moment, it uses percentage, but I am
> > happy
> > > to
> > > > > > > change
> > > > > > > > to
> > > > > > > > > a fraction (out of 1 instead of 100) if required. I have
> > added
> > > > the
> > > > > > > > examples
> > > > > > > > > from this discussion to the KIP. Also added a "Future Work"
> > > > section
> > > > > > to
> > > > > > > > > address network thread utilization. The configuration is
> > named
> > > > > > > > > "request_time_percent" with the expectation that it can
> also
> > be
> > > > > used
> > > > > > as
> > > > > > > > the
> > > > > > > > > limit for network thread utilization when that is
> > implemented,
> > > so
> > > > > > that
> > > > > > > > > users have to set only one config for the two and not have
> to
> > > > worry
> > > > > > > about
> > > > > > > > > the internal distribution of the work between the two
> thread
> > > > pools
> > > > > in
> > > > > > > > > Kafka.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Rajini
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao <
> j...@confluent.io>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Rajini,
> > > > > > > > > >
> > > > > > > > > > Thanks for the proposal.
> > > > > > > > > >
> > > > > > > > > > The benefit of using the request processing time over the
> > > > request
> > > > > > > rate
> > > > > > > > is
> > > > > > > > > > exactly what people have said. I will just expand that a
> > bit.
> > > > > > > Consider
> > > > > > > > > the
> > > > > > > > > > following case. The producer sends a produce request
> with a
> > > > 10MB
> > > > > > > > message
> > > > > > > > > > but compressed to 100KB with gzip. The decompression of
> the
> > > > > message
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > broker could take 10-15 seconds, during which time, a
> > request
> > > > > > handler
> > > > > > > > > > thread is completely blocked. In this case, neither the
> > > byte-in
> > > > > > quota
> > > > > > > > nor
> > > > > > > > > > the request rate quota may be effective in protecting the
> > > > broker.
> > > > > > > > > Consider
> > > > > > > > > > another case. A consumer group starts with 10 instances
> and
> > > > later
> > > > > > on
> > > > > > > > > > switches to 20 instances. The request rate will likely
> > > double,
> > > > > but
> > > > > > > the
> > > > > > > > > > actually load on the broker may not double since each
> fetch
> > > > > request
> > > > > > > > only
> > > > > > > > > > contains half of the partitions. Request rate quota may
> not
> > > be
> > > > > easy
> > > > > > > to
> > > > > > > > > > configure in this case.
> > > > > > > > > >
> > > > > > > > > > What we really want is to be able to prevent a client
> from
> > > > using
> > > > > > too
> > > > > > > > much
> > > > > > > > > > of the server side resources. In this particular KIP,
> this
> > > > > resource
> > > > > > > is
> > > > > > > > > the
> > > > > > > > > > capacity of the request handler threads. I agree that it
> > may
> > > > not
> > > > > be
> > > > > > > > > > intuitive for the users to determine how to set the right
> > > > limit.
> > > > > > > > However,
> > > > > > > > > > this is not completely new and has been done in the
> > container
> > > > > world
> > > > > > > > > > already. For example, Linux cgroup (
> > > https://access.redhat.com/
> > > > > > > > > > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
> > > > > > > > > > Resource_Management_Guide/sec-cpu.html) has the concept
> of
> > > > > > > > > > cpu.cfs_quota_us,
> > > > > > > > > > which specifies the total amount of time in microseconds
> > for
> > > > > which
> > > > > > > all
> > > > > > > > > > tasks in a cgroup can run during a one second period. We
> > can
> > > > > > > > potentially
> > > > > > > > > > model the request handler threads in a similar way. For
> > > > example,
> > > > > > each
> > > > > > > > > > request handler thread can be 1 request handler unit and
> > the
> > > > > admin
> > > > > > > can
> > > > > > > > > > configure a limit on how many units (say 0.01) a client
> can
> > > > have.
> > > > > > > > > >
> > > > > > > > > > Regarding not throttling the internal broker to broker
> > > > requests.
> > > > > We
> > > > > > > > could
> > > > > > > > > > do that. Alternatively, we could just let the admin
> > > configure a
> > > > > > high
> > > > > > > > > limit
> > > > > > > > > > for the kafka user (it may not be able to do that easily
> > > based
> > > > on
> > > > > > > > > clientId
> > > > > > > > > > though).
> > > > > > > > > >
> > > > > > > > > > Ideally we want to be able to protect the utilization of
> > the
> > > > > > network
> > > > > > > > > thread
> > > > > > > > > > pool too. The difficult is mostly what Rajini said: (1)
> The
> > > > > > mechanism
> > > > > > > > for
> > > > > > > > > > throttling the requests is through Purgatory and we will
> > have
> > > > to
> > > > > > > think
> > > > > > > > > > through how to integrate that into the network layer.
> (2)
> > In
> > > > the
> > > > > > > > network
> > > > > > > > > > layer, currently we know the user, but not the clientId
> of
> > > the
> > > > > > > request.
> > > > > > > > > So,
> > > > > > > > > > it's a bit tricky to throttle based on clientId there.
> > Plus,
> > > > the
> > > > > > > > byteOut
> > > > > > > > > > quota can already protect the network thread utilization
> > for
> > > > > fetch
> > > > > > > > > > requests. So, if we can't figure out this part right now,
> > > just
> > > > > > > focusing
> > > > > > > > > on
> > > > > > > > > > the request handling threads for this KIP is still a
> useful
> > > > > > feature.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram <
> > > > > > > > rajinisiva...@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thank you all for the feedback.
> > > > > > > > > > >
> > > > > > > > > > > Jay: I have removed exemption for consumer heartbeat
> etc.
> > > > Agree
> > > > > > > that
> > > > > > > > > > > protecting the cluster is more important than
> protecting
> > > > > > individual
> > > > > > > > > apps.
> > > > > > > > > > > Have retained the exemption for
> StopReplicat/LeaderAndIsr
> > > > etc,
> > > > > > > these
> > > > > > > > > are
> > > > > > > > > > > throttled only if authorization fails (so can't be used
> > for
> > > > DoS
> > > > > > > > attacks
> > > > > > > > > > in
> > > > > > > > > > > a secure cluster, but allows inter-broker requests to
> > > > complete
> > > > > > > > without
> > > > > > > > > > > delays).
> > > > > > > > > > >
> > > > > > > > > > > I will wait another day to see if these is any
> objection
> > to
> > > > > > quotas
> > > > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > request processing time (as opposed to request rate)
> and
> > if
> > > > > there
> > > > > > > are
> > > > > > > > > no
> > > > > > > > > > > objections, I will revert to the original proposal with
> > > some
> > > > > > > changes.
> > > > > > > > > > >
> > > > > > > > > > > The original proposal was only including the time used
> by
> > > the
> > > > > > > request
> > > > > > > > > > > handler threads (that made calculation easy). I think
> the
> > > > > > > suggestion
> > > > > > > > is
> > > > > > > > > > to
> > > > > > > > > > > include the time spent in the network threads as well
> > since
> > > > > that
> > > > > > > may
> > > > > > > > be
> > > > > > > > > > > significant. As Jay pointed out, it is more complicated
> > to
> > > > > > > calculate
> > > > > > > > > the
> > > > > > > > > > > total available CPU time and convert to a ratio when
> > there
> > > > *m*
> > > > > > I/O
> > > > > > > > > > threads
> > > > > > > > > > > and *n* network threads. ThreadMXBean#getThreadCPUTime(
> )
> > > may
> > > > > > give
> > > > > > > us
> > > > > > > > > > what
> > > > > > > > > > > we want, but it can be very expensive on some
> platforms.
> > As
> > > > > > Becket
> > > > > > > > and
> > > > > > > > > > > Guozhang have pointed out, we do have several time
> > > > measurements
> > > > > > > > already
> > > > > > > > > > for
> > > > > > > > > > > generating metrics that we could use, though we might
> > want
> > > to
> > > > > > > switch
> > > > > > > > to
> > > > > > > > > > > nanoTime() instead of currentTimeMillis() since some of
> > the
> > > > > > values
> > > > > > > > for
> > > > > > > > > > > small requests may be < 1ms. But rather than add up the
> > > time
> > > > > > spent
> > > > > > > in
> > > > > > > > > I/O
> > > > > > > > > > > thread and network thread, wouldn't it be better to
> > convert
> > > > the
> > > > > > > time
> > > > > > > > > > spent
> > > > > > > > > > > on each thread into a separate ratio? UserA has a
> request
> > > > quota
> > > > > > of
> > > > > > > > 5%.
> > > > > > > > > > Can
> > > > > > > > > > > we take that to mean that UserA can use 5% of the time
> on
> > > > > network
> > > > > > > > > threads
> > > > > > > > > > > and 5% of the time on I/O threads? If either is
> exceeded,
> > > the
> > > > > > > > response
> > > > > > > > > is
> > > > > > > > > > > throttled - it would mean maintaining two sets of
> metrics
> > > for
> > > > > the
> > > > > > > two
> > > > > > > > > > > durations, but would result in more meaningful ratios.
> We
> > > > could
> > > > > > > > define
> > > > > > > > > > two
> > > > > > > > > > > quota limits (UserA has 5% of request threads and 10%
> of
> > > > > network
> > > > > > > > > > threads),
> > > > > > > > > > > but that seems unnecessary and harder to explain to
> > users.
> > > > > > > > > > >
> > > > > > > > > > > Back to why and how quotas are applied to network
> thread
> > > > > > > utilization:
> > > > > > > > > > > a) In the case of fetch,  the time spent in the network
> > > > thread
> > > > > > may
> > > > > > > be
> > > > > > > > > > > significant and I can see the need to include this. Are
> > > there
> > > > > > other
> > > > > > > > > > > requests where the network thread utilization is
> > > significant?
> > > > > In
> > > > > > > the
> > > > > > > > > case
> > > > > > > > > > > of fetch, request handler thread utilization would
> > throttle
> > > > > > clients
> > > > > > > > > with
> > > > > > > > > > > high request rate, low data volume and fetch byte rate
> > > quota
> > > > > will
> > > > > > > > > > throttle
> > > > > > > > > > > clients with high data volume. Network thread
> utilization
> > > is
> > > > > > > perhaps
> > > > > > > > > > > proportional to the data volume. I am wondering if we
> > even
> > > > need
> > > > > > to
> > > > > > > > > > throttle
> > > > > > > > > > > based on network thread utilization or whether the data
> > > > volume
> > > > > > > quota
> > > > > > > > > > covers
> > > > > > > > > > > this case.
> > > > > > > > > > >
> > > > > > > > > > > b) At the moment, we record and check for quota
> violation
> > > at
> > > > > the
> > > > > > > same
> > > > > > > > > > time.
> > > > > > > > > > > If a quota is violated, the response is delayed. Using
> > > Jay'e
> > > > > > > example
> > > > > > > > of
> > > > > > > > > > > disk reads for fetches happening in the network thread,
> > We
> > > > > can't
> > > > > > > > record
> > > > > > > > > > and
> > > > > > > > > > > delay a response after the disk reads. We could record
> > the
> > > > time
> > > > > > > spent
> > > > > > > > > on
> > > > > > > > > > > the network thread when the response is complete and
> > > > introduce
> > > > > a
> > > > > > > > delay
> > > > > > > > > > for
> > > > > > > > > > > handling a subsequent request (separate out recording
> and
> > > > quota
> > > > > > > > > violation
> > > > > > > > > > > handling in the case of network thread overload). Does
> > that
> > > > > make
> > > > > > > > sense?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > >
> > > > > > > > > > > Rajini
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin <
> > > > > > becket....@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey Jay,
> > > > > > > > > > > >
> > > > > > > > > > > > Yeah, I agree that enforcing the CPU time is a little
> > > > > tricky. I
> > > > > > > am
> > > > > > > > > > > thinking
> > > > > > > > > > > > that maybe we can use the existing request
> statistics.
> > > They
> > > > > are
> > > > > > > > > already
> > > > > > > > > > > > very detailed so we can probably see the approximate
> > CPU
> > > > time
> > > > > > > from
> > > > > > > > > it,
> > > > > > > > > > > e.g.
> > > > > > > > > > > > something like (total_time -
> > request/response_queue_time
> > > -
> > > > > > > > > > remote_time).
> > > > > > > > > > > >
> > > > > > > > > > > > I agree with Guozhang that when a user is throttled
> it
> > is
> > > > > > likely
> > > > > > > > that
> > > > > > > > > > we
> > > > > > > > > > > > need to see if anything has went wrong first, and if
> > the
> > > > > users
> > > > > > > are
> > > > > > > > > well
> > > > > > > > > > > > behaving and just need more resources, we will have
> to
> > > bump
> > > > > up
> > > > > > > the
> > > > > > > > > > quota
> > > > > > > > > > > > for them. It is true that pre-allocating CPU time
> quota
> > > > > > precisely
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > users is difficult. So in practice it would probably
> be
> > > > more
> > > > > > like
> > > > > > > > > first
> > > > > > > > > > > set
> > > > > > > > > > > > a relative high protective CPU time quota for
> everyone
> > > and
> > > > > > > increase
> > > > > > > > > > that
> > > > > > > > > > > > for some individual clients on demand.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang <
> > > > > > > wangg...@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > This is a great proposal, glad to see it happening.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am inclined to the CPU throttling, or more
> > > specifically
> > > > > > > > > processing
> > > > > > > > > > > time
> > > > > > > > > > > > > ratio instead of the request rate throttling as
> well.
> > > > > Becket
> > > > > > > has
> > > > > > > > > very
> > > > > > > > > > > > well
> > > > > > > > > > > > > summed my rationales above, and one thing to add
> here
> > > is
> > > > > that
> > > > > > > the
> > > > > > > > > > > former
> > > > > > > > > > > > > has a good support for both "protecting against
> rogue
> > > > > > clients"
> > > > > > > as
> > > > > > > > > > well
> > > > > > > > > > > as
> > > > > > > > > > > > > "utilizing a cluster for multi-tenancy usage": when
> > > > > thinking
> > > > > > > > about
> > > > > > > > > > how
> > > > > > > > > > > to
> > > > > > > > > > > > > explain this to the end users, I find it actually
> > more
> > > > > > natural
> > > > > > > > than
> > > > > > > > > > the
> > > > > > > > > > > > > request rate since as mentioned above, different
> > > requests
> > > > > > will
> > > > > > > > have
> > > > > > > > > > > quite
> > > > > > > > > > > > > different "cost", and Kafka today already have
> > various
> > > > > > request
> > > > > > > > > types
> > > > > > > > > > > > > (produce, fetch, admin, metadata, etc), because of
> > that
> > > > the
> > > > > > > > request
> > > > > > > > > > > rate
> > > > > > > > > > > > > throttling may not be as effective unless it is set
> > > very
> > > > > > > > > > > conservatively.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding to user reactions when they are
> throttled,
> > I
> > > > > think
> > > > > > it
> > > > > > > > may
> > > > > > > > > > > > differ
> > > > > > > > > > > > > case-by-case, and need to be discovered / guided by
> > > > looking
> > > > > > at
> > > > > > > > > > relative
> > > > > > > > > > > > > metrics. So in other words users would not expect
> to
> > > get
> > > > > > > > additional
> > > > > > > > > > > > > information by simply being told "hey, you are
> > > > throttled",
> > > > > > > which
> > > > > > > > is
> > > > > > > > > > all
> > > > > > > > > > > > > what throttling does; they need to take a follow-up
> > > step
> > > > > and
> > > > > > > see
> > > > > > > > > > "hmm,
> > > > > > > > > > > > I'm
> > > > > > > > > > > > > throttled probably because of ..", which is by
> > looking
> > > at
> > > > > > other
> > > > > > > > > > metric
> > > > > > > > > > > > > values: e.g. whether I'm bombarding the brokers
> with
> > > > > metadata
> > > > > > > > > > request,
> > > > > > > > > > > > > which are usually cheap to handle but I'm sending
> > > > thousands
> > > > > > per
> > > > > > > > > > second;
> > > > > > > > > > > > or
> > > > > > > > > > > > > is it because I'm catching up and hence sending
> very
> > > > heavy
> > > > > > > > fetching
> > > > > > > > > > > > request
> > > > > > > > > > > > > with large min.bytes, etc.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding to the implementation, as once discussed
> > with
> > > > > Jun,
> > > > > > > this
> > > > > > > > > > seems
> > > > > > > > > > > > not
> > > > > > > > > > > > > very difficult since today we are already
> collecting
> > > the
> > > > > > > "thread
> > > > > > > > > pool
> > > > > > > > > > > > > utilization" metrics, which is a single percentage
> > > > > > > > > > "aggregateIdleMeter"
> > > > > > > > > > > > > value; but we are already effectively aggregating
> it
> > > for
> > > > > each
> > > > > > > > > > requests
> > > > > > > > > > > in
> > > > > > > > > > > > > KafkaRequestHandler, and we can just extend it by
> > > > recording
> > > > > > the
> > > > > > > > > > source
> > > > > > > > > > > > > client id when handling them and aggregating by
> > > clientId
> > > > as
> > > > > > > well
> > > > > > > > as
> > > > > > > > > > the
> > > > > > > > > > > > > total aggregate.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps <
> > > > > j...@confluent.io
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Becket/Rajini,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > When I thought about it more deeply I came around
> > to
> > > > the
> > > > > > > > "percent
> > > > > > > > > > of
> > > > > > > > > > > > > > processing time" metric too. It seems a lot
> closer
> > to
> > > > the
> > > > > > > thing
> > > > > > > > > we
> > > > > > > > > > > > > actually
> > > > > > > > > > > > > > care about and need to protect. I also think this
> > > would
> > > > > be
> > > > > > a
> > > > > > > > very
> > > > > > > > > > > > useful
> > > > > > > > > > > > > > metric even in the absence of throttling just to
> > > debug
> > > > > > whose
> > > > > > > > > using
> > > > > > > > > > > > > > capacity.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Two problems to consider:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    1. I agree that for the user it is
> > understandable
> > > > what
> > > > > > > lead
> > > > > > > > to
> > > > > > > > > > > their
> > > > > > > > > > > > > >    being throttled, but it is a bit hard to
> figure
> > > out
> > > > > the
> > > > > > > safe
> > > > > > > > > > range
> > > > > > > > > > > > for
> > > > > > > > > > > > > >    them. i.e. if I have a new app that will send
> > 200
> > > > > > > > > messages/sec I
> > > > > > > > > > > can
> > > > > > > > > > > > > >    probably reason that I'll be under the
> > throttling
> > > > > limit
> > > > > > of
> > > > > > > > 300
> > > > > > > > > > > > > req/sec.
> > > > > > > > > > > > > >    However if I need to be under a 10% CPU
> > resources
> > > > > limit
> > > > > > it
> > > > > > > > may
> > > > > > > > > > be
> > > > > > > > > > > a
> > > > > > > > > > > > > bit
> > > > > > > > > > > > > >    harder for me to know a priori if i will or
> > won't.
> > > > > > > > > > > > > >    2. Calculating the available CPU time is a bit
> > > > > difficult
> > > > > > > > since
> > > > > > > > > > > there
> > > > > > > > > > > > > are
> > > > > > > > > > > > > >    actually two thread pools--the I/O threads and
> > the
> > > > > > network
> > > > > > > > > > > threads.
> > > > > > > > > > > > I
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > >    it might be workable to count just the I/O
> > thread
> > > > time
> > > > > > as
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > > proposal,
> > > > > > > > > > > > > >    but the network thread work is actually
> > > non-trivial
> > > > > > (e.g.
> > > > > > > > all
> > > > > > > > > > the
> > > > > > > > > > > > disk
> > > > > > > > > > > > > >    reads for fetches happen in that thread). If
> you
> > > > count
> > > > > > > both
> > > > > > > > > the
> > > > > > > > > > > > > network
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > >    I/O threads it can skew things a bit. E.g. say
> > you
> > > > > have
> > > > > > 50
> > > > > > > > > > network
> > > > > > > > > > > > > > threads,
> > > > > > > > > > > > > >    10 I/O threads, and 8 cores, what is the
> > available
> > > > cpu
> > > > > > > time
> > > > > > > > > > > > available
> > > > > > > > > > > > > > in a
> > > > > > > > > > > > > >    second? I suppose this is a problem whenever
> you
> > > > have
> > > > > a
> > > > > > > > > > bottleneck
> > > > > > > > > > > > > > between
> > > > > > > > > > > > > >    I/O and network threads or if you end up
> > > > significantly
> > > > > > > > > > > > > over-provisioning
> > > > > > > > > > > > > >    one pool (both of which are hard to avoid).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > An alternative for CPU throttling would be to use
> > > this
> > > > > api:
> > > > > > > > > > > > > > http://docs.oracle.com/javase/
> > > > 1.5.0/docs/api/java/lang/
> > > > > > > > > > > > > > management/ThreadMXBean.html#
> > getThreadCpuTime(long)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > That would let you track actual CPU usage across
> > the
> > > > > > network,
> > > > > > > > I/O
> > > > > > > > > > > > > threads,
> > > > > > > > > > > > > > and purgatory threads and look at it as a
> > percentage
> > > of
> > > > > > total
> > > > > > > > > > cores.
> > > > > > > > > > > I
> > > > > > > > > > > > > > think this fixes many problems in the reliability
> > of
> > > > the
> > > > > > > > metric.
> > > > > > > > > > It's
> > > > > > > > > > > > > > meaning is slightly different as it is just CPU
> > (you
> > > > > don't
> > > > > > > get
> > > > > > > > > > > charged
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > time blocking on I/O) but that may be okay
> because
> > we
> > > > > > already
> > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > > > > throttle on I/O. The downside is I think it is
> > > possible
> > > > > > this
> > > > > > > > api
> > > > > > > > > > can
> > > > > > > > > > > be
> > > > > > > > > > > > > > disabled or isn't always available and it may
> also
> > be
> > > > > > > expensive
> > > > > > > > > > (also
> > > > > > > > > > > > > I've
> > > > > > > > > > > > > > never used it so not sure if it really works the
> > way
> > > i
> > > > > > > think).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -Jay
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin <
> > > > > > > > > becket....@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If the purpose of the KIP is only to protect
> the
> > > > > cluster
> > > > > > > from
> > > > > > > > > > being
> > > > > > > > > > > > > > > overwhelmed by crazy clients and is not
> intended
> > to
> > > > > > address
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > allocation problem among the clients, I am
> > > wondering
> > > > if
> > > > > > > using
> > > > > > > > > > > request
> > > > > > > > > > > > > > > handling time quota (CPU time quota) is a
> better
> > > > > option.
> > > > > > > Here
> > > > > > > > > are
> > > > > > > > > > > the
> > > > > > > > > > > > > > > reasons:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. request handling time quota has better
> > > protection.
> > > > > Say
> > > > > > > we
> > > > > > > > > have
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > rate quota and set that to some value like 100
> > > > > > > requests/sec,
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > > > > possible
> > > > > > > > > > > > > > > that some of the requests are very expensive
> > > actually
> > > > > > take
> > > > > > > a
> > > > > > > > > lot
> > > > > > > > > > of
> > > > > > > > > > > > > time
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > handle. In that case a few clients may still
> > > occupy a
> > > > > lot
> > > > > > > of
> > > > > > > > > CPU
> > > > > > > > > > > time
> > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > the request rate is low. Arguably we can
> > carefully
> > > > set
> > > > > > > > request
> > > > > > > > > > rate
> > > > > > > > > > > > > quota
> > > > > > > > > > > > > > > for each request and client id combination, but
> > it
> > > > > could
> > > > > > > > still
> > > > > > > > > be
> > > > > > > > > > > > > tricky
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > get it right for everyone.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If we use the request time handling quota, we
> can
> > > > > simply
> > > > > > > say
> > > > > > > > no
> > > > > > > > > > > > clients
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > take up to more than 30% of the total request
> > > > handling
> > > > > > > > capacity
> > > > > > > > > > > > > (measured
> > > > > > > > > > > > > > > by time), regardless of the difference among
> > > > different
> > > > > > > > requests
> > > > > > > > > > or
> > > > > > > > > > > > what
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > the client doing. In this case maybe we can
> quota
> > > all
> > > > > the
> > > > > > > > > > requests
> > > > > > > > > > > if
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > want to.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. The main benefit of using request rate limit
> > is
> > > > that
> > > > > > it
> > > > > > > > > seems
> > > > > > > > > > > more
> > > > > > > > > > > > > > > intuitive. It is true that it is probably
> easier
> > to
> > > > > > explain
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > user
> > > > > > > > > > > > > > > what does that mean. However, in practice it
> > looks
> > > > the
> > > > > > > impact
> > > > > > > > > of
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > rate quota is not more quantifiable than the
> > > request
> > > > > > > handling
> > > > > > > > > > time
> > > > > > > > > > > > > quota.
> > > > > > > > > > > > > > > Unlike the byte rate quota, it is still
> difficult
> > > to
> > > > > > give a
> > > > > > > > > > number
> > > > > > > > > > > > > about
> > > > > > > > > > > > > > > impact of throughput or latency when a request
> > rate
> > > > > quota
> > > > > > > is
> > > > > > > > > hit.
> > > > > > > > > > > So
> > > > > > > > > > > > it
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > not better than the request handling time
> quota.
> > In
> > > > > fact
> > > > > > I
> > > > > > > > feel
> > > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > > > > clearer to tell user that "you are limited
> > because
> > > > you
> > > > > > have
> > > > > > > > > taken
> > > > > > > > > > > 30%
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the CPU time on the broker" than otherwise
> > > something
> > > > > like
> > > > > > > > "your
> > > > > > > > > > > > request
> > > > > > > > > > > > > > > rate quota on metadata request has reached".
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps <
> > > > > > > j...@confluent.io
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think this proposal makes a lot of sense
> > > > > (especially
> > > > > > > now
> > > > > > > > > that
> > > > > > > > > > > it
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > > oriented around request rate) and fills the
> > > biggest
> > > > > > > > remaining
> > > > > > > > > > gap
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > multi-tenancy story.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think for intra-cluster communication
> > > > (StopReplica,
> > > > > > > etc)
> > > > > > > > we
> > > > > > > > > > > could
> > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > throttling entirely. You can secure or
> > otherwise
> > > > > > > lock-down
> > > > > > > > > the
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > communication to avoid any unauthorized
> > external
> > > > > party
> > > > > > > from
> > > > > > > > > > > trying
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > initiate these requests. As a result we are
> as
> > > > likely
> > > > > > to
> > > > > > > > > cause
> > > > > > > > > > > > > problems
> > > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > solve them by throttling these, right?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm not so sure that we should exempt the
> > > consumer
> > > > > > > requests
> > > > > > > > > > such
> > > > > > > > > > > as
> > > > > > > > > > > > > > > > heartbeat. It's true that if we throttle an
> > app's
> > > > > > > heartbeat
> > > > > > > > > > > > requests
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > cause it to fall out of its consumer group.
> > > However
> > > > > if
> > > > > > we
> > > > > > > > > don't
> > > > > > > > > > > > > > throttle
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > it may DDOS the cluster if the heartbeat
> > interval
> > > > is
> > > > > > set
> > > > > > > > > > > > incorrectly
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > some client in some language has a bug. I
> think
> > > the
> > > > > > > policy
> > > > > > > > > with
> > > > > > > > > > > > this
> > > > > > > > > > > > > > kind
> > > > > > > > > > > > > > > > of throttling is to protect the cluster above
> > any
> > > > > > > > individual
> > > > > > > > > > app,
> > > > > > > > > > > > > > right?
> > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > think in general this should be okay since
> for
> > > most
> > > > > > > > > deployments
> > > > > > > > > > > > this
> > > > > > > > > > > > > > > > setting is meant as more of a safety
> > valve---that
> > > > is
> > > > > > > rather
> > > > > > > > > > than
> > > > > > > > > > > > set
> > > > > > > > > > > > > > > > something very close to what you expect to
> need
> > > > (say
> > > > > 2
> > > > > > > > > req/sec
> > > > > > > > > > or
> > > > > > > > > > > > > > > whatever)
> > > > > > > > > > > > > > > > you would have something quite high (like 100
> > > > > req/sec)
> > > > > > > with
> > > > > > > > > > this
> > > > > > > > > > > > > meant
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > prevent a client gone crazy. I think when
> used
> > > this
> > > > > way
> > > > > > > > > > allowing
> > > > > > > > > > > > > those
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > be throttled would actually provide
> meaningful
> > > > > > > protection.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -Jay
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini
> > Sivaram <
> > > > > > > > > > > > > > rajinisiva...@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I have just created KIP-124 to introduce
> > > request
> > > > > rate
> > > > > > > > > quotas
> > > > > > > > > > to
> > > > > > > > > > > > > > Kafka:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > > > > > > > > > 124+-+Request+rate+quotas
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The proposal is for a simple percentage
> > request
> > > > > > > handling
> > > > > > > > > time
> > > > > > > > > > > > quota
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > can be allocated to *<client-id>*, *<user>*
> > or
> > > > > > *<user,
> > > > > > > > > > > > client-id>*.
> > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > are a few other suggestions also under
> > > "Rejected
> > > > > > > > > > alternatives".
> > > > > > > > > > > > > > > Feedback
> > > > > > > > > > > > > > > > > and suggestions are welcome.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you...
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Rajini
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-124: Request rate quotas

Reply via email to