Hi Jay, Regarding 1, I definitely like the simplicity of keeping a single throttle time field in the response. The downside is that the client metrics will be more coarse grained.
Regarding 3, we have `leader.imbalance.per.broker.percentage` and `log.cleaner.min.cleanable.ratio`. Ismael On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps <j...@confluent.io> wrote: > A few minor comments: > > 1. Isn't it the case that the throttling time response field should have > the total time your request was throttled irrespective of the quotas > that > caused that. Limiting it to byte rate quota doesn't make sense, but I > also > I don't think we want to end up adding new fields in the response for > every > single thing we quota, right? > 2. I don't think we should make this quota specifically about io > threads. Once we introduce these quotas people set them and expect them > to > be enforced (and if they aren't it may cause an outage). As a result > they > are a bit more sensitive than normal configs, I think. The current > thread > pools seem like something of an implementation detail and not the level > the > user-facing quotas should be involved with. I think it might be better > to > make this a general request-time throttle with no mention in the naming > about I/O threads and simply acknowledge the current limitation (which > we > may someday fix) in the docs that this covers only the time after the > thread is read off the network. > 3. As such I think the right interface to the user would be something > like percent_request_time and be in {0,...100} or request_time_ratio > and be > in {0.0,...,1.0} (I think "ratio" is the terminology we used if the > scale > is between 0 and 1 in the other metrics, right?) > > -Jay > > On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram <rajinisiva...@gmail.com> > wrote: > > > Guozhang/Dong, > > > > Thank you for the feedback. > > > > Guozhang : I have updated the section on co-existence of byte rate and > > request time quotas. > > > > Dong: I hadn't added much detail to the metrics and sensors since they > are > > going to be very similar to the existing metrics and sensors. To avoid > > confusion, I have now added more detail. All metrics are in the group > > "quotaType" and all sensors have names starting with "quotaType" (where > > quotaType is Produce/Fetch/LeaderReplication/ > > FollowerReplication/*IOThread*). > > So there will be no reuse of existing metrics/sensors. The new ones for > > request processing time based throttling will be completely independent > of > > existing metrics/sensors, but will be consistent in format. > > > > The existing throttle_time_ms field in produce/fetch responses will not > be > > impacted by this KIP. That will continue to return byte-rate based > > throttling times. In addition, a new field request_throttle_time_ms will > be > > added to return request quota based throttling times. These will be > exposed > > as new metrics on the client-side. > > > > Since all metrics and sensors are different for each type of quota, I > > believe there is already sufficient metrics to monitor throttling on both > > client and broker side for each type of throttling. > > > > Regards, > > > > Rajini > > > > > > On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin <lindon...@gmail.com> wrote: > > > > > Hey Rajini, > > > > > > I think it makes a lot of sense to use io_thread_units as metric to > quota > > > user's traffic here. LGTM overall. I have some questions regarding > > sensors. > > > > > > - Can you be more specific in the KIP what sensors will be added? For > > > example, it will be useful to specify the name and attributes of these > > new > > > sensors. > > > > > > - We currently have throttle-time and queue-size for byte-rate based > > quota. > > > Are you going to have separate throttle-time and queue-size for > requests > > > throttled by io_thread_unit-based quota, or will they share the same > > > sensor? > > > > > > - Does the throttle-time in the ProduceResponse and FetchResponse > > contains > > > time due to io_thread_unit-based quota? > > > > > > - Currently kafka server doesn't not provide any log or metrics that > > tells > > > whether any given clientId (or user) is throttled. This is not too bad > > > because we can still check the client-side byte-rate metric to validate > > > whether a given client is throttled. But with this io_thread_unit, > there > > > will be no way to validate whether a given client is slow because it > has > > > exceeded its io_thread_unit limit. It is necessary for user to be able > to > > > know this information to figure how whether they have reached there > quota > > > limit. How about we add log4j log on the server side to periodically > > print > > > the (client_id, byte-rate-throttle-time, io-thread-unit-throttle-time) > so > > > that kafka administrator can figure those users that have reached their > > > limit and act accordingly? > > > > > > Thanks, > > > Dong > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > Made a pass over the doc, overall LGTM except a minor comment on the > > > > throttling implementation: > > > > > > > > Stated as "Request processing time throttling will be applied on top > if > > > > necessary." I thought that it meant the request processing time > > > throttling > > > > is applied first, but continue reading I found it actually meant to > > apply > > > > produce / fetch byte rate throttling first. > > > > > > > > Also the last sentence "The remaining delay if any is applied to the > > > > response." is a bit confusing to me. Maybe rewording it a bit? > > > > > > > > > > > > Guozhang > > > > > > > > > > > > On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > Hi, Rajini, > > > > > > > > > > Thanks for the updated KIP. The latest proposal looks good to me. > > > > > > > > > > Jun > > > > > > > > > > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram < > > > rajinisiva...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Jun/Roger, > > > > > > > > > > > > Thank you for the feedback. > > > > > > > > > > > > 1. I have updated the KIP to use absolute units instead of > > > percentage. > > > > > The > > > > > > property is called* io_thread_units* to align with the thread > count > > > > > > property *num.io.threads*. When we implement network thread > > > utilization > > > > > > quotas, we can add another property *network_thread_units.* > > > > > > > > > > > > 2. ControlledShutdown is already listed under the exempt > requests. > > > Jun, > > > > > did > > > > > > you mean a different request that needs to be added? The four > > > requests > > > > > > currently exempt in the KIP are StopReplica, ControlledShutdown, > > > > > > LeaderAndIsr and UpdateMetadata. These are controlled using > > > > ClusterAction > > > > > > ACL, so it is easy to exclude and only throttle if unauthorized. > I > > > > wasn't > > > > > > sure if there are other requests used only for inter-broker that > > > needed > > > > > to > > > > > > be excluded. > > > > > > > > > > > > 3. I was thinking the smallest change would be to replace all > > > > references > > > > > to > > > > > > *requestChannel.sendResponse()* with a local method > > > > > > *sendResponseMaybeThrottle()* that does the throttling if any > plus > > > send > > > > > > response. If we throttle first in *KafkaApis.handle()*, the time > > > spent > > > > > > within the method handling the request will not be recorded or > used > > > in > > > > > > throttling. We can look into this again when the PR is ready for > > > > review. > > > > > > > > > > > > Regards, > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover < > > > roger.hoo...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Great to see this KIP and the excellent discussion. > > > > > > > > > > > > > > To me, Jun's suggestion makes sense. If my application is > > > allocated > > > > 1 > > > > > > > request handler unit, then it's as if I have a Kafka broker > with > > a > > > > > single > > > > > > > request handler thread dedicated to me. That's the most I can > > use, > > > > at > > > > > > > least. That allocation doesn't change even if an admin later > > > > increases > > > > > > the > > > > > > > size of the request thread pool on the broker. It's similar to > > the > > > > CPU > > > > > > > abstraction that VMs and containers get from hypervisors or OS > > > > > > schedulers. > > > > > > > While different client access patterns can use wildly different > > > > amounts > > > > > > of > > > > > > > request thread resources per request, a given application will > > > > > generally > > > > > > > have a stable access pattern and can figure out empirically how > > > many > > > > > > > "request thread units" it needs to meet it's throughput/latency > > > > goals. > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > Roger > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > Hi, Rajini, > > > > > > > > > > > > > > > > Thanks for the updated KIP. A few more comments. > > > > > > > > > > > > > > > > 1. A concern of request_time_percent is that it's not an > > absolute > > > > > > value. > > > > > > > > Let's say you give a user a 10% limit. If the admin doubles > the > > > > > number > > > > > > of > > > > > > > > request handler threads, that user now actually has twice the > > > > > absolute > > > > > > > > capacity. This may confuse people a bit. So, perhaps setting > > the > > > > > quota > > > > > > > > based on an absolute request thread unit is better. > > > > > > > > > > > > > > > > 2. ControlledShutdownRequest is also an inter-broker request > > and > > > > > needs > > > > > > to > > > > > > > > be excluded from throttling. > > > > > > > > > > > > > > > > 3. Implementation wise, I am wondering if it's simpler to > apply > > > the > > > > > > > request > > > > > > > > time throttling first in KafkaApis.handle(). Otherwise, we > will > > > > need > > > > > to > > > > > > > add > > > > > > > > the throttling logic in each type of request. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram < > > > > > > rajinisiva...@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Jun, > > > > > > > > > > > > > > > > > > Thank you for the review. > > > > > > > > > > > > > > > > > > I have reverted to the original KIP that throttles based on > > > > request > > > > > > > > handler > > > > > > > > > utilization. At the moment, it uses percentage, but I am > > happy > > > to > > > > > > > change > > > > > > > > to > > > > > > > > > a fraction (out of 1 instead of 100) if required. I have > > added > > > > the > > > > > > > > examples > > > > > > > > > from this discussion to the KIP. Also added a "Future Work" > > > > section > > > > > > to > > > > > > > > > address network thread utilization. The configuration is > > named > > > > > > > > > "request_time_percent" with the expectation that it can > also > > be > > > > > used > > > > > > as > > > > > > > > the > > > > > > > > > limit for network thread utilization when that is > > implemented, > > > so > > > > > > that > > > > > > > > > users have to set only one config for the two and not have > to > > > > worry > > > > > > > about > > > > > > > > > the internal distribution of the work between the two > thread > > > > pools > > > > > in > > > > > > > > > Kafka. > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao < > j...@confluent.io> > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, Rajini, > > > > > > > > > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > > > > > > > > > The benefit of using the request processing time over the > > > > request > > > > > > > rate > > > > > > > > is > > > > > > > > > > exactly what people have said. I will just expand that a > > bit. > > > > > > > Consider > > > > > > > > > the > > > > > > > > > > following case. The producer sends a produce request > with a > > > > 10MB > > > > > > > > message > > > > > > > > > > but compressed to 100KB with gzip. The decompression of > the > > > > > message > > > > > > > on > > > > > > > > > the > > > > > > > > > > broker could take 10-15 seconds, during which time, a > > request > > > > > > handler > > > > > > > > > > thread is completely blocked. In this case, neither the > > > byte-in > > > > > > quota > > > > > > > > nor > > > > > > > > > > the request rate quota may be effective in protecting the > > > > broker. > > > > > > > > > Consider > > > > > > > > > > another case. A consumer group starts with 10 instances > and > > > > later > > > > > > on > > > > > > > > > > switches to 20 instances. The request rate will likely > > > double, > > > > > but > > > > > > > the > > > > > > > > > > actually load on the broker may not double since each > fetch > > > > > request > > > > > > > > only > > > > > > > > > > contains half of the partitions. Request rate quota may > not > > > be > > > > > easy > > > > > > > to > > > > > > > > > > configure in this case. > > > > > > > > > > > > > > > > > > > > What we really want is to be able to prevent a client > from > > > > using > > > > > > too > > > > > > > > much > > > > > > > > > > of the server side resources. In this particular KIP, > this > > > > > resource > > > > > > > is > > > > > > > > > the > > > > > > > > > > capacity of the request handler threads. I agree that it > > may > > > > not > > > > > be > > > > > > > > > > intuitive for the users to determine how to set the right > > > > limit. > > > > > > > > However, > > > > > > > > > > this is not completely new and has been done in the > > container > > > > > world > > > > > > > > > > already. For example, Linux cgroup ( > > > https://access.redhat.com/ > > > > > > > > > > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/ > > > > > > > > > > Resource_Management_Guide/sec-cpu.html) has the concept > of > > > > > > > > > > cpu.cfs_quota_us, > > > > > > > > > > which specifies the total amount of time in microseconds > > for > > > > > which > > > > > > > all > > > > > > > > > > tasks in a cgroup can run during a one second period. We > > can > > > > > > > > potentially > > > > > > > > > > model the request handler threads in a similar way. For > > > > example, > > > > > > each > > > > > > > > > > request handler thread can be 1 request handler unit and > > the > > > > > admin > > > > > > > can > > > > > > > > > > configure a limit on how many units (say 0.01) a client > can > > > > have. > > > > > > > > > > > > > > > > > > > > Regarding not throttling the internal broker to broker > > > > requests. > > > > > We > > > > > > > > could > > > > > > > > > > do that. Alternatively, we could just let the admin > > > configure a > > > > > > high > > > > > > > > > limit > > > > > > > > > > for the kafka user (it may not be able to do that easily > > > based > > > > on > > > > > > > > > clientId > > > > > > > > > > though). > > > > > > > > > > > > > > > > > > > > Ideally we want to be able to protect the utilization of > > the > > > > > > network > > > > > > > > > thread > > > > > > > > > > pool too. The difficult is mostly what Rajini said: (1) > The > > > > > > mechanism > > > > > > > > for > > > > > > > > > > throttling the requests is through Purgatory and we will > > have > > > > to > > > > > > > think > > > > > > > > > > through how to integrate that into the network layer. > (2) > > In > > > > the > > > > > > > > network > > > > > > > > > > layer, currently we know the user, but not the clientId > of > > > the > > > > > > > request. > > > > > > > > > So, > > > > > > > > > > it's a bit tricky to throttle based on clientId there. > > Plus, > > > > the > > > > > > > > byteOut > > > > > > > > > > quota can already protect the network thread utilization > > for > > > > > fetch > > > > > > > > > > requests. So, if we can't figure out this part right now, > > > just > > > > > > > focusing > > > > > > > > > on > > > > > > > > > > the request handling threads for this KIP is still a > useful > > > > > > feature. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram < > > > > > > > > rajinisiva...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thank you all for the feedback. > > > > > > > > > > > > > > > > > > > > > > Jay: I have removed exemption for consumer heartbeat > etc. > > > > Agree > > > > > > > that > > > > > > > > > > > protecting the cluster is more important than > protecting > > > > > > individual > > > > > > > > > apps. > > > > > > > > > > > Have retained the exemption for > StopReplicat/LeaderAndIsr > > > > etc, > > > > > > > these > > > > > > > > > are > > > > > > > > > > > throttled only if authorization fails (so can't be used > > for > > > > DoS > > > > > > > > attacks > > > > > > > > > > in > > > > > > > > > > > a secure cluster, but allows inter-broker requests to > > > > complete > > > > > > > > without > > > > > > > > > > > delays). > > > > > > > > > > > > > > > > > > > > > > I will wait another day to see if these is any > objection > > to > > > > > > quotas > > > > > > > > > based > > > > > > > > > > on > > > > > > > > > > > request processing time (as opposed to request rate) > and > > if > > > > > there > > > > > > > are > > > > > > > > > no > > > > > > > > > > > objections, I will revert to the original proposal with > > > some > > > > > > > changes. > > > > > > > > > > > > > > > > > > > > > > The original proposal was only including the time used > by > > > the > > > > > > > request > > > > > > > > > > > handler threads (that made calculation easy). I think > the > > > > > > > suggestion > > > > > > > > is > > > > > > > > > > to > > > > > > > > > > > include the time spent in the network threads as well > > since > > > > > that > > > > > > > may > > > > > > > > be > > > > > > > > > > > significant. As Jay pointed out, it is more complicated > > to > > > > > > > calculate > > > > > > > > > the > > > > > > > > > > > total available CPU time and convert to a ratio when > > there > > > > *m* > > > > > > I/O > > > > > > > > > > threads > > > > > > > > > > > and *n* network threads. ThreadMXBean#getThreadCPUTime( > ) > > > may > > > > > > give > > > > > > > us > > > > > > > > > > what > > > > > > > > > > > we want, but it can be very expensive on some > platforms. > > As > > > > > > Becket > > > > > > > > and > > > > > > > > > > > Guozhang have pointed out, we do have several time > > > > measurements > > > > > > > > already > > > > > > > > > > for > > > > > > > > > > > generating metrics that we could use, though we might > > want > > > to > > > > > > > switch > > > > > > > > to > > > > > > > > > > > nanoTime() instead of currentTimeMillis() since some of > > the > > > > > > values > > > > > > > > for > > > > > > > > > > > small requests may be < 1ms. But rather than add up the > > > time > > > > > > spent > > > > > > > in > > > > > > > > > I/O > > > > > > > > > > > thread and network thread, wouldn't it be better to > > convert > > > > the > > > > > > > time > > > > > > > > > > spent > > > > > > > > > > > on each thread into a separate ratio? UserA has a > request > > > > quota > > > > > > of > > > > > > > > 5%. > > > > > > > > > > Can > > > > > > > > > > > we take that to mean that UserA can use 5% of the time > on > > > > > network > > > > > > > > > threads > > > > > > > > > > > and 5% of the time on I/O threads? If either is > exceeded, > > > the > > > > > > > > response > > > > > > > > > is > > > > > > > > > > > throttled - it would mean maintaining two sets of > metrics > > > for > > > > > the > > > > > > > two > > > > > > > > > > > durations, but would result in more meaningful ratios. > We > > > > could > > > > > > > > define > > > > > > > > > > two > > > > > > > > > > > quota limits (UserA has 5% of request threads and 10% > of > > > > > network > > > > > > > > > > threads), > > > > > > > > > > > but that seems unnecessary and harder to explain to > > users. > > > > > > > > > > > > > > > > > > > > > > Back to why and how quotas are applied to network > thread > > > > > > > utilization: > > > > > > > > > > > a) In the case of fetch, the time spent in the network > > > > thread > > > > > > may > > > > > > > be > > > > > > > > > > > significant and I can see the need to include this. Are > > > there > > > > > > other > > > > > > > > > > > requests where the network thread utilization is > > > significant? > > > > > In > > > > > > > the > > > > > > > > > case > > > > > > > > > > > of fetch, request handler thread utilization would > > throttle > > > > > > clients > > > > > > > > > with > > > > > > > > > > > high request rate, low data volume and fetch byte rate > > > quota > > > > > will > > > > > > > > > > throttle > > > > > > > > > > > clients with high data volume. Network thread > utilization > > > is > > > > > > > perhaps > > > > > > > > > > > proportional to the data volume. I am wondering if we > > even > > > > need > > > > > > to > > > > > > > > > > throttle > > > > > > > > > > > based on network thread utilization or whether the data > > > > volume > > > > > > > quota > > > > > > > > > > covers > > > > > > > > > > > this case. > > > > > > > > > > > > > > > > > > > > > > b) At the moment, we record and check for quota > violation > > > at > > > > > the > > > > > > > same > > > > > > > > > > time. > > > > > > > > > > > If a quota is violated, the response is delayed. Using > > > Jay'e > > > > > > > example > > > > > > > > of > > > > > > > > > > > disk reads for fetches happening in the network thread, > > We > > > > > can't > > > > > > > > record > > > > > > > > > > and > > > > > > > > > > > delay a response after the disk reads. We could record > > the > > > > time > > > > > > > spent > > > > > > > > > on > > > > > > > > > > > the network thread when the response is complete and > > > > introduce > > > > > a > > > > > > > > delay > > > > > > > > > > for > > > > > > > > > > > handling a subsequent request (separate out recording > and > > > > quota > > > > > > > > > violation > > > > > > > > > > > handling in the case of network thread overload). Does > > that > > > > > make > > > > > > > > sense? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin < > > > > > > becket....@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hey Jay, > > > > > > > > > > > > > > > > > > > > > > > > Yeah, I agree that enforcing the CPU time is a little > > > > > tricky. I > > > > > > > am > > > > > > > > > > > thinking > > > > > > > > > > > > that maybe we can use the existing request > statistics. > > > They > > > > > are > > > > > > > > > already > > > > > > > > > > > > very detailed so we can probably see the approximate > > CPU > > > > time > > > > > > > from > > > > > > > > > it, > > > > > > > > > > > e.g. > > > > > > > > > > > > something like (total_time - > > request/response_queue_time > > > - > > > > > > > > > > remote_time). > > > > > > > > > > > > > > > > > > > > > > > > I agree with Guozhang that when a user is throttled > it > > is > > > > > > likely > > > > > > > > that > > > > > > > > > > we > > > > > > > > > > > > need to see if anything has went wrong first, and if > > the > > > > > users > > > > > > > are > > > > > > > > > well > > > > > > > > > > > > behaving and just need more resources, we will have > to > > > bump > > > > > up > > > > > > > the > > > > > > > > > > quota > > > > > > > > > > > > for them. It is true that pre-allocating CPU time > quota > > > > > > precisely > > > > > > > > for > > > > > > > > > > the > > > > > > > > > > > > users is difficult. So in practice it would probably > be > > > > more > > > > > > like > > > > > > > > > first > > > > > > > > > > > set > > > > > > > > > > > > a relative high protective CPU time quota for > everyone > > > and > > > > > > > increase > > > > > > > > > > that > > > > > > > > > > > > for some individual clients on demand. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang < > > > > > > > wangg...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > This is a great proposal, glad to see it happening. > > > > > > > > > > > > > > > > > > > > > > > > > > I am inclined to the CPU throttling, or more > > > specifically > > > > > > > > > processing > > > > > > > > > > > time > > > > > > > > > > > > > ratio instead of the request rate throttling as > well. > > > > > Becket > > > > > > > has > > > > > > > > > very > > > > > > > > > > > > well > > > > > > > > > > > > > summed my rationales above, and one thing to add > here > > > is > > > > > that > > > > > > > the > > > > > > > > > > > former > > > > > > > > > > > > > has a good support for both "protecting against > rogue > > > > > > clients" > > > > > > > as > > > > > > > > > > well > > > > > > > > > > > as > > > > > > > > > > > > > "utilizing a cluster for multi-tenancy usage": when > > > > > thinking > > > > > > > > about > > > > > > > > > > how > > > > > > > > > > > to > > > > > > > > > > > > > explain this to the end users, I find it actually > > more > > > > > > natural > > > > > > > > than > > > > > > > > > > the > > > > > > > > > > > > > request rate since as mentioned above, different > > > requests > > > > > > will > > > > > > > > have > > > > > > > > > > > quite > > > > > > > > > > > > > different "cost", and Kafka today already have > > various > > > > > > request > > > > > > > > > types > > > > > > > > > > > > > (produce, fetch, admin, metadata, etc), because of > > that > > > > the > > > > > > > > request > > > > > > > > > > > rate > > > > > > > > > > > > > throttling may not be as effective unless it is set > > > very > > > > > > > > > > > conservatively. > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding to user reactions when they are > throttled, > > I > > > > > think > > > > > > it > > > > > > > > may > > > > > > > > > > > > differ > > > > > > > > > > > > > case-by-case, and need to be discovered / guided by > > > > looking > > > > > > at > > > > > > > > > > relative > > > > > > > > > > > > > metrics. So in other words users would not expect > to > > > get > > > > > > > > additional > > > > > > > > > > > > > information by simply being told "hey, you are > > > > throttled", > > > > > > > which > > > > > > > > is > > > > > > > > > > all > > > > > > > > > > > > > what throttling does; they need to take a follow-up > > > step > > > > > and > > > > > > > see > > > > > > > > > > "hmm, > > > > > > > > > > > > I'm > > > > > > > > > > > > > throttled probably because of ..", which is by > > looking > > > at > > > > > > other > > > > > > > > > > metric > > > > > > > > > > > > > values: e.g. whether I'm bombarding the brokers > with > > > > > metadata > > > > > > > > > > request, > > > > > > > > > > > > > which are usually cheap to handle but I'm sending > > > > thousands > > > > > > per > > > > > > > > > > second; > > > > > > > > > > > > or > > > > > > > > > > > > > is it because I'm catching up and hence sending > very > > > > heavy > > > > > > > > fetching > > > > > > > > > > > > request > > > > > > > > > > > > > with large min.bytes, etc. > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding to the implementation, as once discussed > > with > > > > > Jun, > > > > > > > this > > > > > > > > > > seems > > > > > > > > > > > > not > > > > > > > > > > > > > very difficult since today we are already > collecting > > > the > > > > > > > "thread > > > > > > > > > pool > > > > > > > > > > > > > utilization" metrics, which is a single percentage > > > > > > > > > > "aggregateIdleMeter" > > > > > > > > > > > > > value; but we are already effectively aggregating > it > > > for > > > > > each > > > > > > > > > > requests > > > > > > > > > > > in > > > > > > > > > > > > > KafkaRequestHandler, and we can just extend it by > > > > recording > > > > > > the > > > > > > > > > > source > > > > > > > > > > > > > client id when handling them and aggregating by > > > clientId > > > > as > > > > > > > well > > > > > > > > as > > > > > > > > > > the > > > > > > > > > > > > > total aggregate. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps < > > > > > j...@confluent.io > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hey Becket/Rajini, > > > > > > > > > > > > > > > > > > > > > > > > > > > > When I thought about it more deeply I came around > > to > > > > the > > > > > > > > "percent > > > > > > > > > > of > > > > > > > > > > > > > > processing time" metric too. It seems a lot > closer > > to > > > > the > > > > > > > thing > > > > > > > > > we > > > > > > > > > > > > > actually > > > > > > > > > > > > > > care about and need to protect. I also think this > > > would > > > > > be > > > > > > a > > > > > > > > very > > > > > > > > > > > > useful > > > > > > > > > > > > > > metric even in the absence of throttling just to > > > debug > > > > > > whose > > > > > > > > > using > > > > > > > > > > > > > > capacity. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Two problems to consider: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. I agree that for the user it is > > understandable > > > > what > > > > > > > lead > > > > > > > > to > > > > > > > > > > > their > > > > > > > > > > > > > > being throttled, but it is a bit hard to > figure > > > out > > > > > the > > > > > > > safe > > > > > > > > > > range > > > > > > > > > > > > for > > > > > > > > > > > > > > them. i.e. if I have a new app that will send > > 200 > > > > > > > > > messages/sec I > > > > > > > > > > > can > > > > > > > > > > > > > > probably reason that I'll be under the > > throttling > > > > > limit > > > > > > of > > > > > > > > 300 > > > > > > > > > > > > > req/sec. > > > > > > > > > > > > > > However if I need to be under a 10% CPU > > resources > > > > > limit > > > > > > it > > > > > > > > may > > > > > > > > > > be > > > > > > > > > > > a > > > > > > > > > > > > > bit > > > > > > > > > > > > > > harder for me to know a priori if i will or > > won't. > > > > > > > > > > > > > > 2. Calculating the available CPU time is a bit > > > > > difficult > > > > > > > > since > > > > > > > > > > > there > > > > > > > > > > > > > are > > > > > > > > > > > > > > actually two thread pools--the I/O threads and > > the > > > > > > network > > > > > > > > > > > threads. > > > > > > > > > > > > I > > > > > > > > > > > > > > think > > > > > > > > > > > > > > it might be workable to count just the I/O > > thread > > > > time > > > > > > as > > > > > > > in > > > > > > > > > the > > > > > > > > > > > > > > proposal, > > > > > > > > > > > > > > but the network thread work is actually > > > non-trivial > > > > > > (e.g. > > > > > > > > all > > > > > > > > > > the > > > > > > > > > > > > disk > > > > > > > > > > > > > > reads for fetches happen in that thread). If > you > > > > count > > > > > > > both > > > > > > > > > the > > > > > > > > > > > > > network > > > > > > > > > > > > > > and > > > > > > > > > > > > > > I/O threads it can skew things a bit. E.g. say > > you > > > > > have > > > > > > 50 > > > > > > > > > > network > > > > > > > > > > > > > > threads, > > > > > > > > > > > > > > 10 I/O threads, and 8 cores, what is the > > available > > > > cpu > > > > > > > time > > > > > > > > > > > > available > > > > > > > > > > > > > > in a > > > > > > > > > > > > > > second? I suppose this is a problem whenever > you > > > > have > > > > > a > > > > > > > > > > bottleneck > > > > > > > > > > > > > > between > > > > > > > > > > > > > > I/O and network threads or if you end up > > > > significantly > > > > > > > > > > > > > over-provisioning > > > > > > > > > > > > > > one pool (both of which are hard to avoid). > > > > > > > > > > > > > > > > > > > > > > > > > > > > An alternative for CPU throttling would be to use > > > this > > > > > api: > > > > > > > > > > > > > > http://docs.oracle.com/javase/ > > > > 1.5.0/docs/api/java/lang/ > > > > > > > > > > > > > > management/ThreadMXBean.html# > > getThreadCpuTime(long) > > > > > > > > > > > > > > > > > > > > > > > > > > > > That would let you track actual CPU usage across > > the > > > > > > network, > > > > > > > > I/O > > > > > > > > > > > > > threads, > > > > > > > > > > > > > > and purgatory threads and look at it as a > > percentage > > > of > > > > > > total > > > > > > > > > > cores. > > > > > > > > > > > I > > > > > > > > > > > > > > think this fixes many problems in the reliability > > of > > > > the > > > > > > > > metric. > > > > > > > > > > It's > > > > > > > > > > > > > > meaning is slightly different as it is just CPU > > (you > > > > > don't > > > > > > > get > > > > > > > > > > > charged > > > > > > > > > > > > > for > > > > > > > > > > > > > > time blocking on I/O) but that may be okay > because > > we > > > > > > already > > > > > > > > > have > > > > > > > > > > a > > > > > > > > > > > > > > throttle on I/O. The downside is I think it is > > > possible > > > > > > this > > > > > > > > api > > > > > > > > > > can > > > > > > > > > > > be > > > > > > > > > > > > > > disabled or isn't always available and it may > also > > be > > > > > > > expensive > > > > > > > > > > (also > > > > > > > > > > > > > I've > > > > > > > > > > > > > > never used it so not sure if it really works the > > way > > > i > > > > > > > think). > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin < > > > > > > > > > becket....@gmail.com> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If the purpose of the KIP is only to protect > the > > > > > cluster > > > > > > > from > > > > > > > > > > being > > > > > > > > > > > > > > > overwhelmed by crazy clients and is not > intended > > to > > > > > > address > > > > > > > > > > > resource > > > > > > > > > > > > > > > allocation problem among the clients, I am > > > wondering > > > > if > > > > > > > using > > > > > > > > > > > request > > > > > > > > > > > > > > > handling time quota (CPU time quota) is a > better > > > > > option. > > > > > > > Here > > > > > > > > > are > > > > > > > > > > > the > > > > > > > > > > > > > > > reasons: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. request handling time quota has better > > > protection. > > > > > Say > > > > > > > we > > > > > > > > > have > > > > > > > > > > > > > request > > > > > > > > > > > > > > > rate quota and set that to some value like 100 > > > > > > > requests/sec, > > > > > > > > it > > > > > > > > > > is > > > > > > > > > > > > > > possible > > > > > > > > > > > > > > > that some of the requests are very expensive > > > actually > > > > > > take > > > > > > > a > > > > > > > > > lot > > > > > > > > > > of > > > > > > > > > > > > > time > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > handle. In that case a few clients may still > > > occupy a > > > > > lot > > > > > > > of > > > > > > > > > CPU > > > > > > > > > > > time > > > > > > > > > > > > > > even > > > > > > > > > > > > > > > the request rate is low. Arguably we can > > carefully > > > > set > > > > > > > > request > > > > > > > > > > rate > > > > > > > > > > > > > quota > > > > > > > > > > > > > > > for each request and client id combination, but > > it > > > > > could > > > > > > > > still > > > > > > > > > be > > > > > > > > > > > > > tricky > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > get it right for everyone. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we use the request time handling quota, we > can > > > > > simply > > > > > > > say > > > > > > > > no > > > > > > > > > > > > clients > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > take up to more than 30% of the total request > > > > handling > > > > > > > > capacity > > > > > > > > > > > > > (measured > > > > > > > > > > > > > > > by time), regardless of the difference among > > > > different > > > > > > > > requests > > > > > > > > > > or > > > > > > > > > > > > what > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > the client doing. In this case maybe we can > quota > > > all > > > > > the > > > > > > > > > > requests > > > > > > > > > > > if > > > > > > > > > > > > > we > > > > > > > > > > > > > > > want to. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The main benefit of using request rate limit > > is > > > > that > > > > > > it > > > > > > > > > seems > > > > > > > > > > > more > > > > > > > > > > > > > > > intuitive. It is true that it is probably > easier > > to > > > > > > explain > > > > > > > > to > > > > > > > > > > the > > > > > > > > > > > > user > > > > > > > > > > > > > > > what does that mean. However, in practice it > > looks > > > > the > > > > > > > impact > > > > > > > > > of > > > > > > > > > > > > > request > > > > > > > > > > > > > > > rate quota is not more quantifiable than the > > > request > > > > > > > handling > > > > > > > > > > time > > > > > > > > > > > > > quota. > > > > > > > > > > > > > > > Unlike the byte rate quota, it is still > difficult > > > to > > > > > > give a > > > > > > > > > > number > > > > > > > > > > > > > about > > > > > > > > > > > > > > > impact of throughput or latency when a request > > rate > > > > > quota > > > > > > > is > > > > > > > > > hit. > > > > > > > > > > > So > > > > > > > > > > > > it > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > not better than the request handling time > quota. > > In > > > > > fact > > > > > > I > > > > > > > > feel > > > > > > > > > > it > > > > > > > > > > > is > > > > > > > > > > > > > > > clearer to tell user that "you are limited > > because > > > > you > > > > > > have > > > > > > > > > taken > > > > > > > > > > > 30% > > > > > > > > > > > > > of > > > > > > > > > > > > > > > the CPU time on the broker" than otherwise > > > something > > > > > like > > > > > > > > "your > > > > > > > > > > > > request > > > > > > > > > > > > > > > rate quota on metadata request has reached". > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps < > > > > > > > j...@confluent.io > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this proposal makes a lot of sense > > > > > (especially > > > > > > > now > > > > > > > > > that > > > > > > > > > > > it > > > > > > > > > > > > is > > > > > > > > > > > > > > > > oriented around request rate) and fills the > > > biggest > > > > > > > > remaining > > > > > > > > > > gap > > > > > > > > > > > > in > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > multi-tenancy story. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think for intra-cluster communication > > > > (StopReplica, > > > > > > > etc) > > > > > > > > we > > > > > > > > > > > could > > > > > > > > > > > > > > avoid > > > > > > > > > > > > > > > > throttling entirely. You can secure or > > otherwise > > > > > > > lock-down > > > > > > > > > the > > > > > > > > > > > > > cluster > > > > > > > > > > > > > > > > communication to avoid any unauthorized > > external > > > > > party > > > > > > > from > > > > > > > > > > > trying > > > > > > > > > > > > to > > > > > > > > > > > > > > > > initiate these requests. As a result we are > as > > > > likely > > > > > > to > > > > > > > > > cause > > > > > > > > > > > > > problems > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > solve them by throttling these, right? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm not so sure that we should exempt the > > > consumer > > > > > > > requests > > > > > > > > > > such > > > > > > > > > > > as > > > > > > > > > > > > > > > > heartbeat. It's true that if we throttle an > > app's > > > > > > > heartbeat > > > > > > > > > > > > requests > > > > > > > > > > > > > it > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > cause it to fall out of its consumer group. > > > However > > > > > if > > > > > > we > > > > > > > > > don't > > > > > > > > > > > > > > throttle > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > it may DDOS the cluster if the heartbeat > > interval > > > > is > > > > > > set > > > > > > > > > > > > incorrectly > > > > > > > > > > > > > or > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > some client in some language has a bug. I > think > > > the > > > > > > > policy > > > > > > > > > with > > > > > > > > > > > > this > > > > > > > > > > > > > > kind > > > > > > > > > > > > > > > > of throttling is to protect the cluster above > > any > > > > > > > > individual > > > > > > > > > > app, > > > > > > > > > > > > > > right? > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > think in general this should be okay since > for > > > most > > > > > > > > > deployments > > > > > > > > > > > > this > > > > > > > > > > > > > > > > setting is meant as more of a safety > > valve---that > > > > is > > > > > > > rather > > > > > > > > > > than > > > > > > > > > > > > set > > > > > > > > > > > > > > > > something very close to what you expect to > need > > > > (say > > > > > 2 > > > > > > > > > req/sec > > > > > > > > > > or > > > > > > > > > > > > > > > whatever) > > > > > > > > > > > > > > > > you would have something quite high (like 100 > > > > > req/sec) > > > > > > > with > > > > > > > > > > this > > > > > > > > > > > > > meant > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > prevent a client gone crazy. I think when > used > > > this > > > > > way > > > > > > > > > > allowing > > > > > > > > > > > > > those > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > be throttled would actually provide > meaningful > > > > > > > protection. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini > > Sivaram < > > > > > > > > > > > > > > rajinisiva...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have just created KIP-124 to introduce > > > request > > > > > rate > > > > > > > > > quotas > > > > > > > > > > to > > > > > > > > > > > > > > Kafka: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/ > > > > > > confluence/display/KAFKA/KIP- > > > > > > > > > > > > > > > > > 124+-+Request+rate+quotas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The proposal is for a simple percentage > > request > > > > > > > handling > > > > > > > > > time > > > > > > > > > > > > quota > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > can be allocated to *<client-id>*, *<user>* > > or > > > > > > *<user, > > > > > > > > > > > > client-id>*. > > > > > > > > > > > > > > > There > > > > > > > > > > > > > > > > > are a few other suggestions also under > > > "Rejected > > > > > > > > > > alternatives". > > > > > > > > > > > > > > > Feedback > > > > > > > > > > > > > > > > > and suggestions are welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > >