Hi, Ismael, For #3, typically, an admin won't configure more io threads than CPU cores, but it's possible for an admin to start with fewer io threads than cores and grow that later on.
Hi, Dong, I think the throttleTime sensor on the broker tells the admin whether a user/clentId is throttled or not. Hi, Radi, The reasoning for delaying the throttled requests on the broker instead of returning an error immediately is that the latter has no way to prevent the client from retrying immediately, which will make things worse. The delaying logic is based off a delay queue. A separate expiration thread just waits on the next to be expired request. So, it doesn't tie up a request handler thread. Thanks, Jun On Thu, Feb 23, 2017 at 9:07 AM, Ismael Juma <ism...@juma.me.uk> wrote: > Hi Jay, > > Regarding 1, I definitely like the simplicity of keeping a single throttle > time field in the response. The downside is that the client metrics will be > more coarse grained. > > Regarding 3, we have `leader.imbalance.per.broker.percentage` and > `log.cleaner.min.cleanable.ratio`. > > Ismael > > On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps <j...@confluent.io> wrote: > > > A few minor comments: > > > > 1. Isn't it the case that the throttling time response field should > have > > the total time your request was throttled irrespective of the quotas > > that > > caused that. Limiting it to byte rate quota doesn't make sense, but I > > also > > I don't think we want to end up adding new fields in the response for > > every > > single thing we quota, right? > > 2. I don't think we should make this quota specifically about io > > threads. Once we introduce these quotas people set them and expect > them > > to > > be enforced (and if they aren't it may cause an outage). As a result > > they > > are a bit more sensitive than normal configs, I think. The current > > thread > > pools seem like something of an implementation detail and not the > level > > the > > user-facing quotas should be involved with. I think it might be better > > to > > make this a general request-time throttle with no mention in the > naming > > about I/O threads and simply acknowledge the current limitation (which > > we > > may someday fix) in the docs that this covers only the time after the > > thread is read off the network. > > 3. As such I think the right interface to the user would be something > > like percent_request_time and be in {0,...100} or request_time_ratio > > and be > > in {0.0,...,1.0} (I think "ratio" is the terminology we used if the > > scale > > is between 0 and 1 in the other metrics, right?) > > > > -Jay > > > > On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram <rajinisiva...@gmail.com > > > > wrote: > > > > > Guozhang/Dong, > > > > > > Thank you for the feedback. > > > > > > Guozhang : I have updated the section on co-existence of byte rate and > > > request time quotas. > > > > > > Dong: I hadn't added much detail to the metrics and sensors since they > > are > > > going to be very similar to the existing metrics and sensors. To avoid > > > confusion, I have now added more detail. All metrics are in the group > > > "quotaType" and all sensors have names starting with "quotaType" (where > > > quotaType is Produce/Fetch/LeaderReplication/ > > > FollowerReplication/*IOThread*). > > > So there will be no reuse of existing metrics/sensors. The new ones for > > > request processing time based throttling will be completely independent > > of > > > existing metrics/sensors, but will be consistent in format. > > > > > > The existing throttle_time_ms field in produce/fetch responses will not > > be > > > impacted by this KIP. That will continue to return byte-rate based > > > throttling times. In addition, a new field request_throttle_time_ms > will > > be > > > added to return request quota based throttling times. These will be > > exposed > > > as new metrics on the client-side. > > > > > > Since all metrics and sensors are different for each type of quota, I > > > believe there is already sufficient metrics to monitor throttling on > both > > > client and broker side for each type of throttling. > > > > > > Regards, > > > > > > Rajini > > > > > > > > > On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin <lindon...@gmail.com> wrote: > > > > > > > Hey Rajini, > > > > > > > > I think it makes a lot of sense to use io_thread_units as metric to > > quota > > > > user's traffic here. LGTM overall. I have some questions regarding > > > sensors. > > > > > > > > - Can you be more specific in the KIP what sensors will be added? For > > > > example, it will be useful to specify the name and attributes of > these > > > new > > > > sensors. > > > > > > > > - We currently have throttle-time and queue-size for byte-rate based > > > quota. > > > > Are you going to have separate throttle-time and queue-size for > > requests > > > > throttled by io_thread_unit-based quota, or will they share the same > > > > sensor? > > > > > > > > - Does the throttle-time in the ProduceResponse and FetchResponse > > > contains > > > > time due to io_thread_unit-based quota? > > > > > > > > - Currently kafka server doesn't not provide any log or metrics that > > > tells > > > > whether any given clientId (or user) is throttled. This is not too > bad > > > > because we can still check the client-side byte-rate metric to > validate > > > > whether a given client is throttled. But with this io_thread_unit, > > there > > > > will be no way to validate whether a given client is slow because it > > has > > > > exceeded its io_thread_unit limit. It is necessary for user to be > able > > to > > > > know this information to figure how whether they have reached there > > quota > > > > limit. How about we add log4j log on the server side to periodically > > > print > > > > the (client_id, byte-rate-throttle-time, > io-thread-unit-throttle-time) > > so > > > > that kafka administrator can figure those users that have reached > their > > > > limit and act accordingly? > > > > > > > > Thanks, > > > > Dong > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang <wangg...@gmail.com> > > > wrote: > > > > > > > > > Made a pass over the doc, overall LGTM except a minor comment on > the > > > > > throttling implementation: > > > > > > > > > > Stated as "Request processing time throttling will be applied on > top > > if > > > > > necessary." I thought that it meant the request processing time > > > > throttling > > > > > is applied first, but continue reading I found it actually meant to > > > apply > > > > > produce / fetch byte rate throttling first. > > > > > > > > > > Also the last sentence "The remaining delay if any is applied to > the > > > > > response." is a bit confusing to me. Maybe rewording it a bit? > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > Hi, Rajini, > > > > > > > > > > > > Thanks for the updated KIP. The latest proposal looks good to me. > > > > > > > > > > > > Jun > > > > > > > > > > > > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram < > > > > rajinisiva...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Jun/Roger, > > > > > > > > > > > > > > Thank you for the feedback. > > > > > > > > > > > > > > 1. I have updated the KIP to use absolute units instead of > > > > percentage. > > > > > > The > > > > > > > property is called* io_thread_units* to align with the thread > > count > > > > > > > property *num.io.threads*. When we implement network thread > > > > utilization > > > > > > > quotas, we can add another property *network_thread_units.* > > > > > > > > > > > > > > 2. ControlledShutdown is already listed under the exempt > > requests. > > > > Jun, > > > > > > did > > > > > > > you mean a different request that needs to be added? The four > > > > requests > > > > > > > currently exempt in the KIP are StopReplica, > ControlledShutdown, > > > > > > > LeaderAndIsr and UpdateMetadata. These are controlled using > > > > > ClusterAction > > > > > > > ACL, so it is easy to exclude and only throttle if > unauthorized. > > I > > > > > wasn't > > > > > > > sure if there are other requests used only for inter-broker > that > > > > needed > > > > > > to > > > > > > > be excluded. > > > > > > > > > > > > > > 3. I was thinking the smallest change would be to replace all > > > > > references > > > > > > to > > > > > > > *requestChannel.sendResponse()* with a local method > > > > > > > *sendResponseMaybeThrottle()* that does the throttling if any > > plus > > > > send > > > > > > > response. If we throttle first in *KafkaApis.handle()*, the > time > > > > spent > > > > > > > within the method handling the request will not be recorded or > > used > > > > in > > > > > > > throttling. We can look into this again when the PR is ready > for > > > > > review. > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover < > > > > roger.hoo...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Great to see this KIP and the excellent discussion. > > > > > > > > > > > > > > > > To me, Jun's suggestion makes sense. If my application is > > > > allocated > > > > > 1 > > > > > > > > request handler unit, then it's as if I have a Kafka broker > > with > > > a > > > > > > single > > > > > > > > request handler thread dedicated to me. That's the most I > can > > > use, > > > > > at > > > > > > > > least. That allocation doesn't change even if an admin later > > > > > increases > > > > > > > the > > > > > > > > size of the request thread pool on the broker. It's similar > to > > > the > > > > > CPU > > > > > > > > abstraction that VMs and containers get from hypervisors or > OS > > > > > > > schedulers. > > > > > > > > While different client access patterns can use wildly > different > > > > > amounts > > > > > > > of > > > > > > > > request thread resources per request, a given application > will > > > > > > generally > > > > > > > > have a stable access pattern and can figure out empirically > how > > > > many > > > > > > > > "request thread units" it needs to meet it's > throughput/latency > > > > > goals. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > Roger > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao <j...@confluent.io> > > > wrote: > > > > > > > > > > > > > > > > > Hi, Rajini, > > > > > > > > > > > > > > > > > > Thanks for the updated KIP. A few more comments. > > > > > > > > > > > > > > > > > > 1. A concern of request_time_percent is that it's not an > > > absolute > > > > > > > value. > > > > > > > > > Let's say you give a user a 10% limit. If the admin doubles > > the > > > > > > number > > > > > > > of > > > > > > > > > request handler threads, that user now actually has twice > the > > > > > > absolute > > > > > > > > > capacity. This may confuse people a bit. So, perhaps > setting > > > the > > > > > > quota > > > > > > > > > based on an absolute request thread unit is better. > > > > > > > > > > > > > > > > > > 2. ControlledShutdownRequest is also an inter-broker > request > > > and > > > > > > needs > > > > > > > to > > > > > > > > > be excluded from throttling. > > > > > > > > > > > > > > > > > > 3. Implementation wise, I am wondering if it's simpler to > > apply > > > > the > > > > > > > > request > > > > > > > > > time throttling first in KafkaApis.handle(). Otherwise, we > > will > > > > > need > > > > > > to > > > > > > > > add > > > > > > > > > the throttling logic in each type of request. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram < > > > > > > > rajinisiva...@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Jun, > > > > > > > > > > > > > > > > > > > > Thank you for the review. > > > > > > > > > > > > > > > > > > > > I have reverted to the original KIP that throttles based > on > > > > > request > > > > > > > > > handler > > > > > > > > > > utilization. At the moment, it uses percentage, but I am > > > happy > > > > to > > > > > > > > change > > > > > > > > > to > > > > > > > > > > a fraction (out of 1 instead of 100) if required. I have > > > added > > > > > the > > > > > > > > > examples > > > > > > > > > > from this discussion to the KIP. Also added a "Future > Work" > > > > > section > > > > > > > to > > > > > > > > > > address network thread utilization. The configuration is > > > named > > > > > > > > > > "request_time_percent" with the expectation that it can > > also > > > be > > > > > > used > > > > > > > as > > > > > > > > > the > > > > > > > > > > limit for network thread utilization when that is > > > implemented, > > > > so > > > > > > > that > > > > > > > > > > users have to set only one config for the two and not > have > > to > > > > > worry > > > > > > > > about > > > > > > > > > > the internal distribution of the work between the two > > thread > > > > > pools > > > > > > in > > > > > > > > > > Kafka. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao < > > j...@confluent.io> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi, Rajini, > > > > > > > > > > > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > > > > > > > > > > > The benefit of using the request processing time over > the > > > > > request > > > > > > > > rate > > > > > > > > > is > > > > > > > > > > > exactly what people have said. I will just expand that > a > > > bit. > > > > > > > > Consider > > > > > > > > > > the > > > > > > > > > > > following case. The producer sends a produce request > > with a > > > > > 10MB > > > > > > > > > message > > > > > > > > > > > but compressed to 100KB with gzip. The decompression of > > the > > > > > > message > > > > > > > > on > > > > > > > > > > the > > > > > > > > > > > broker could take 10-15 seconds, during which time, a > > > request > > > > > > > handler > > > > > > > > > > > thread is completely blocked. In this case, neither the > > > > byte-in > > > > > > > quota > > > > > > > > > nor > > > > > > > > > > > the request rate quota may be effective in protecting > the > > > > > broker. > > > > > > > > > > Consider > > > > > > > > > > > another case. A consumer group starts with 10 instances > > and > > > > > later > > > > > > > on > > > > > > > > > > > switches to 20 instances. The request rate will likely > > > > double, > > > > > > but > > > > > > > > the > > > > > > > > > > > actually load on the broker may not double since each > > fetch > > > > > > request > > > > > > > > > only > > > > > > > > > > > contains half of the partitions. Request rate quota may > > not > > > > be > > > > > > easy > > > > > > > > to > > > > > > > > > > > configure in this case. > > > > > > > > > > > > > > > > > > > > > > What we really want is to be able to prevent a client > > from > > > > > using > > > > > > > too > > > > > > > > > much > > > > > > > > > > > of the server side resources. In this particular KIP, > > this > > > > > > resource > > > > > > > > is > > > > > > > > > > the > > > > > > > > > > > capacity of the request handler threads. I agree that > it > > > may > > > > > not > > > > > > be > > > > > > > > > > > intuitive for the users to determine how to set the > right > > > > > limit. > > > > > > > > > However, > > > > > > > > > > > this is not completely new and has been done in the > > > container > > > > > > world > > > > > > > > > > > already. For example, Linux cgroup ( > > > > https://access.redhat.com/ > > > > > > > > > > > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/ > > > > > > > > > > > Resource_Management_Guide/sec-cpu.html) has the > concept > > of > > > > > > > > > > > cpu.cfs_quota_us, > > > > > > > > > > > which specifies the total amount of time in > microseconds > > > for > > > > > > which > > > > > > > > all > > > > > > > > > > > tasks in a cgroup can run during a one second period. > We > > > can > > > > > > > > > potentially > > > > > > > > > > > model the request handler threads in a similar way. For > > > > > example, > > > > > > > each > > > > > > > > > > > request handler thread can be 1 request handler unit > and > > > the > > > > > > admin > > > > > > > > can > > > > > > > > > > > configure a limit on how many units (say 0.01) a client > > can > > > > > have. > > > > > > > > > > > > > > > > > > > > > > Regarding not throttling the internal broker to broker > > > > > requests. > > > > > > We > > > > > > > > > could > > > > > > > > > > > do that. Alternatively, we could just let the admin > > > > configure a > > > > > > > high > > > > > > > > > > limit > > > > > > > > > > > for the kafka user (it may not be able to do that > easily > > > > based > > > > > on > > > > > > > > > > clientId > > > > > > > > > > > though). > > > > > > > > > > > > > > > > > > > > > > Ideally we want to be able to protect the utilization > of > > > the > > > > > > > network > > > > > > > > > > thread > > > > > > > > > > > pool too. The difficult is mostly what Rajini said: (1) > > The > > > > > > > mechanism > > > > > > > > > for > > > > > > > > > > > throttling the requests is through Purgatory and we > will > > > have > > > > > to > > > > > > > > think > > > > > > > > > > > through how to integrate that into the network layer. > > (2) > > > In > > > > > the > > > > > > > > > network > > > > > > > > > > > layer, currently we know the user, but not the clientId > > of > > > > the > > > > > > > > request. > > > > > > > > > > So, > > > > > > > > > > > it's a bit tricky to throttle based on clientId there. > > > Plus, > > > > > the > > > > > > > > > byteOut > > > > > > > > > > > quota can already protect the network thread > utilization > > > for > > > > > > fetch > > > > > > > > > > > requests. So, if we can't figure out this part right > now, > > > > just > > > > > > > > focusing > > > > > > > > > > on > > > > > > > > > > > the request handling threads for this KIP is still a > > useful > > > > > > > feature. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram < > > > > > > > > > rajinisiva...@gmail.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Thank you all for the feedback. > > > > > > > > > > > > > > > > > > > > > > > > Jay: I have removed exemption for consumer heartbeat > > etc. > > > > > Agree > > > > > > > > that > > > > > > > > > > > > protecting the cluster is more important than > > protecting > > > > > > > individual > > > > > > > > > > apps. > > > > > > > > > > > > Have retained the exemption for > > StopReplicat/LeaderAndIsr > > > > > etc, > > > > > > > > these > > > > > > > > > > are > > > > > > > > > > > > throttled only if authorization fails (so can't be > used > > > for > > > > > DoS > > > > > > > > > attacks > > > > > > > > > > > in > > > > > > > > > > > > a secure cluster, but allows inter-broker requests to > > > > > complete > > > > > > > > > without > > > > > > > > > > > > delays). > > > > > > > > > > > > > > > > > > > > > > > > I will wait another day to see if these is any > > objection > > > to > > > > > > > quotas > > > > > > > > > > based > > > > > > > > > > > on > > > > > > > > > > > > request processing time (as opposed to request rate) > > and > > > if > > > > > > there > > > > > > > > are > > > > > > > > > > no > > > > > > > > > > > > objections, I will revert to the original proposal > with > > > > some > > > > > > > > changes. > > > > > > > > > > > > > > > > > > > > > > > > The original proposal was only including the time > used > > by > > > > the > > > > > > > > request > > > > > > > > > > > > handler threads (that made calculation easy). I think > > the > > > > > > > > suggestion > > > > > > > > > is > > > > > > > > > > > to > > > > > > > > > > > > include the time spent in the network threads as well > > > since > > > > > > that > > > > > > > > may > > > > > > > > > be > > > > > > > > > > > > significant. As Jay pointed out, it is more > complicated > > > to > > > > > > > > calculate > > > > > > > > > > the > > > > > > > > > > > > total available CPU time and convert to a ratio when > > > there > > > > > *m* > > > > > > > I/O > > > > > > > > > > > threads > > > > > > > > > > > > and *n* network threads. > ThreadMXBean#getThreadCPUTime( > > ) > > > > may > > > > > > > give > > > > > > > > us > > > > > > > > > > > what > > > > > > > > > > > > we want, but it can be very expensive on some > > platforms. > > > As > > > > > > > Becket > > > > > > > > > and > > > > > > > > > > > > Guozhang have pointed out, we do have several time > > > > > measurements > > > > > > > > > already > > > > > > > > > > > for > > > > > > > > > > > > generating metrics that we could use, though we might > > > want > > > > to > > > > > > > > switch > > > > > > > > > to > > > > > > > > > > > > nanoTime() instead of currentTimeMillis() since some > of > > > the > > > > > > > values > > > > > > > > > for > > > > > > > > > > > > small requests may be < 1ms. But rather than add up > the > > > > time > > > > > > > spent > > > > > > > > in > > > > > > > > > > I/O > > > > > > > > > > > > thread and network thread, wouldn't it be better to > > > convert > > > > > the > > > > > > > > time > > > > > > > > > > > spent > > > > > > > > > > > > on each thread into a separate ratio? UserA has a > > request > > > > > quota > > > > > > > of > > > > > > > > > 5%. > > > > > > > > > > > Can > > > > > > > > > > > > we take that to mean that UserA can use 5% of the > time > > on > > > > > > network > > > > > > > > > > threads > > > > > > > > > > > > and 5% of the time on I/O threads? If either is > > exceeded, > > > > the > > > > > > > > > response > > > > > > > > > > is > > > > > > > > > > > > throttled - it would mean maintaining two sets of > > metrics > > > > for > > > > > > the > > > > > > > > two > > > > > > > > > > > > durations, but would result in more meaningful > ratios. > > We > > > > > could > > > > > > > > > define > > > > > > > > > > > two > > > > > > > > > > > > quota limits (UserA has 5% of request threads and 10% > > of > > > > > > network > > > > > > > > > > > threads), > > > > > > > > > > > > but that seems unnecessary and harder to explain to > > > users. > > > > > > > > > > > > > > > > > > > > > > > > Back to why and how quotas are applied to network > > thread > > > > > > > > utilization: > > > > > > > > > > > > a) In the case of fetch, the time spent in the > network > > > > > thread > > > > > > > may > > > > > > > > be > > > > > > > > > > > > significant and I can see the need to include this. > Are > > > > there > > > > > > > other > > > > > > > > > > > > requests where the network thread utilization is > > > > significant? > > > > > > In > > > > > > > > the > > > > > > > > > > case > > > > > > > > > > > > of fetch, request handler thread utilization would > > > throttle > > > > > > > clients > > > > > > > > > > with > > > > > > > > > > > > high request rate, low data volume and fetch byte > rate > > > > quota > > > > > > will > > > > > > > > > > > throttle > > > > > > > > > > > > clients with high data volume. Network thread > > utilization > > > > is > > > > > > > > perhaps > > > > > > > > > > > > proportional to the data volume. I am wondering if we > > > even > > > > > need > > > > > > > to > > > > > > > > > > > throttle > > > > > > > > > > > > based on network thread utilization or whether the > data > > > > > volume > > > > > > > > quota > > > > > > > > > > > covers > > > > > > > > > > > > this case. > > > > > > > > > > > > > > > > > > > > > > > > b) At the moment, we record and check for quota > > violation > > > > at > > > > > > the > > > > > > > > same > > > > > > > > > > > time. > > > > > > > > > > > > If a quota is violated, the response is delayed. > Using > > > > Jay'e > > > > > > > > example > > > > > > > > > of > > > > > > > > > > > > disk reads for fetches happening in the network > thread, > > > We > > > > > > can't > > > > > > > > > record > > > > > > > > > > > and > > > > > > > > > > > > delay a response after the disk reads. We could > record > > > the > > > > > time > > > > > > > > spent > > > > > > > > > > on > > > > > > > > > > > > the network thread when the response is complete and > > > > > introduce > > > > > > a > > > > > > > > > delay > > > > > > > > > > > for > > > > > > > > > > > > handling a subsequent request (separate out recording > > and > > > > > quota > > > > > > > > > > violation > > > > > > > > > > > > handling in the case of network thread overload). > Does > > > that > > > > > > make > > > > > > > > > sense? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin < > > > > > > > becket....@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hey Jay, > > > > > > > > > > > > > > > > > > > > > > > > > > Yeah, I agree that enforcing the CPU time is a > little > > > > > > tricky. I > > > > > > > > am > > > > > > > > > > > > thinking > > > > > > > > > > > > > that maybe we can use the existing request > > statistics. > > > > They > > > > > > are > > > > > > > > > > already > > > > > > > > > > > > > very detailed so we can probably see the > approximate > > > CPU > > > > > time > > > > > > > > from > > > > > > > > > > it, > > > > > > > > > > > > e.g. > > > > > > > > > > > > > something like (total_time - > > > request/response_queue_time > > > > - > > > > > > > > > > > remote_time). > > > > > > > > > > > > > > > > > > > > > > > > > > I agree with Guozhang that when a user is throttled > > it > > > is > > > > > > > likely > > > > > > > > > that > > > > > > > > > > > we > > > > > > > > > > > > > need to see if anything has went wrong first, and > if > > > the > > > > > > users > > > > > > > > are > > > > > > > > > > well > > > > > > > > > > > > > behaving and just need more resources, we will have > > to > > > > bump > > > > > > up > > > > > > > > the > > > > > > > > > > > quota > > > > > > > > > > > > > for them. It is true that pre-allocating CPU time > > quota > > > > > > > precisely > > > > > > > > > for > > > > > > > > > > > the > > > > > > > > > > > > > users is difficult. So in practice it would > probably > > be > > > > > more > > > > > > > like > > > > > > > > > > first > > > > > > > > > > > > set > > > > > > > > > > > > > a relative high protective CPU time quota for > > everyone > > > > and > > > > > > > > increase > > > > > > > > > > > that > > > > > > > > > > > > > for some individual clients on demand. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang < > > > > > > > > wangg...@gmail.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > This is a great proposal, glad to see it > happening. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am inclined to the CPU throttling, or more > > > > specifically > > > > > > > > > > processing > > > > > > > > > > > > time > > > > > > > > > > > > > > ratio instead of the request rate throttling as > > well. > > > > > > Becket > > > > > > > > has > > > > > > > > > > very > > > > > > > > > > > > > well > > > > > > > > > > > > > > summed my rationales above, and one thing to add > > here > > > > is > > > > > > that > > > > > > > > the > > > > > > > > > > > > former > > > > > > > > > > > > > > has a good support for both "protecting against > > rogue > > > > > > > clients" > > > > > > > > as > > > > > > > > > > > well > > > > > > > > > > > > as > > > > > > > > > > > > > > "utilizing a cluster for multi-tenancy usage": > when > > > > > > thinking > > > > > > > > > about > > > > > > > > > > > how > > > > > > > > > > > > to > > > > > > > > > > > > > > explain this to the end users, I find it actually > > > more > > > > > > > natural > > > > > > > > > than > > > > > > > > > > > the > > > > > > > > > > > > > > request rate since as mentioned above, different > > > > requests > > > > > > > will > > > > > > > > > have > > > > > > > > > > > > quite > > > > > > > > > > > > > > different "cost", and Kafka today already have > > > various > > > > > > > request > > > > > > > > > > types > > > > > > > > > > > > > > (produce, fetch, admin, metadata, etc), because > of > > > that > > > > > the > > > > > > > > > request > > > > > > > > > > > > rate > > > > > > > > > > > > > > throttling may not be as effective unless it is > set > > > > very > > > > > > > > > > > > conservatively. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding to user reactions when they are > > throttled, > > > I > > > > > > think > > > > > > > it > > > > > > > > > may > > > > > > > > > > > > > differ > > > > > > > > > > > > > > case-by-case, and need to be discovered / guided > by > > > > > looking > > > > > > > at > > > > > > > > > > > relative > > > > > > > > > > > > > > metrics. So in other words users would not expect > > to > > > > get > > > > > > > > > additional > > > > > > > > > > > > > > information by simply being told "hey, you are > > > > > throttled", > > > > > > > > which > > > > > > > > > is > > > > > > > > > > > all > > > > > > > > > > > > > > what throttling does; they need to take a > follow-up > > > > step > > > > > > and > > > > > > > > see > > > > > > > > > > > "hmm, > > > > > > > > > > > > > I'm > > > > > > > > > > > > > > throttled probably because of ..", which is by > > > looking > > > > at > > > > > > > other > > > > > > > > > > > metric > > > > > > > > > > > > > > values: e.g. whether I'm bombarding the brokers > > with > > > > > > metadata > > > > > > > > > > > request, > > > > > > > > > > > > > > which are usually cheap to handle but I'm sending > > > > > thousands > > > > > > > per > > > > > > > > > > > second; > > > > > > > > > > > > > or > > > > > > > > > > > > > > is it because I'm catching up and hence sending > > very > > > > > heavy > > > > > > > > > fetching > > > > > > > > > > > > > request > > > > > > > > > > > > > > with large min.bytes, etc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding to the implementation, as once > discussed > > > with > > > > > > Jun, > > > > > > > > this > > > > > > > > > > > seems > > > > > > > > > > > > > not > > > > > > > > > > > > > > very difficult since today we are already > > collecting > > > > the > > > > > > > > "thread > > > > > > > > > > pool > > > > > > > > > > > > > > utilization" metrics, which is a single > percentage > > > > > > > > > > > "aggregateIdleMeter" > > > > > > > > > > > > > > value; but we are already effectively aggregating > > it > > > > for > > > > > > each > > > > > > > > > > > requests > > > > > > > > > > > > in > > > > > > > > > > > > > > KafkaRequestHandler, and we can just extend it by > > > > > recording > > > > > > > the > > > > > > > > > > > source > > > > > > > > > > > > > > client id when handling them and aggregating by > > > > clientId > > > > > as > > > > > > > > well > > > > > > > > > as > > > > > > > > > > > the > > > > > > > > > > > > > > total aggregate. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps < > > > > > > j...@confluent.io > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hey Becket/Rajini, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When I thought about it more deeply I came > around > > > to > > > > > the > > > > > > > > > "percent > > > > > > > > > > > of > > > > > > > > > > > > > > > processing time" metric too. It seems a lot > > closer > > > to > > > > > the > > > > > > > > thing > > > > > > > > > > we > > > > > > > > > > > > > > actually > > > > > > > > > > > > > > > care about and need to protect. I also think > this > > > > would > > > > > > be > > > > > > > a > > > > > > > > > very > > > > > > > > > > > > > useful > > > > > > > > > > > > > > > metric even in the absence of throttling just > to > > > > debug > > > > > > > whose > > > > > > > > > > using > > > > > > > > > > > > > > > capacity. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Two problems to consider: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. I agree that for the user it is > > > understandable > > > > > what > > > > > > > > lead > > > > > > > > > to > > > > > > > > > > > > their > > > > > > > > > > > > > > > being throttled, but it is a bit hard to > > figure > > > > out > > > > > > the > > > > > > > > safe > > > > > > > > > > > range > > > > > > > > > > > > > for > > > > > > > > > > > > > > > them. i.e. if I have a new app that will > send > > > 200 > > > > > > > > > > messages/sec I > > > > > > > > > > > > can > > > > > > > > > > > > > > > probably reason that I'll be under the > > > throttling > > > > > > limit > > > > > > > of > > > > > > > > > 300 > > > > > > > > > > > > > > req/sec. > > > > > > > > > > > > > > > However if I need to be under a 10% CPU > > > resources > > > > > > limit > > > > > > > it > > > > > > > > > may > > > > > > > > > > > be > > > > > > > > > > > > a > > > > > > > > > > > > > > bit > > > > > > > > > > > > > > > harder for me to know a priori if i will or > > > won't. > > > > > > > > > > > > > > > 2. Calculating the available CPU time is a > bit > > > > > > difficult > > > > > > > > > since > > > > > > > > > > > > there > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > actually two thread pools--the I/O threads > and > > > the > > > > > > > network > > > > > > > > > > > > threads. > > > > > > > > > > > > > I > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > it might be workable to count just the I/O > > > thread > > > > > time > > > > > > > as > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > > > > proposal, > > > > > > > > > > > > > > > but the network thread work is actually > > > > non-trivial > > > > > > > (e.g. > > > > > > > > > all > > > > > > > > > > > the > > > > > > > > > > > > > disk > > > > > > > > > > > > > > > reads for fetches happen in that thread). If > > you > > > > > count > > > > > > > > both > > > > > > > > > > the > > > > > > > > > > > > > > network > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > I/O threads it can skew things a bit. E.g. > say > > > you > > > > > > have > > > > > > > 50 > > > > > > > > > > > network > > > > > > > > > > > > > > > threads, > > > > > > > > > > > > > > > 10 I/O threads, and 8 cores, what is the > > > available > > > > > cpu > > > > > > > > time > > > > > > > > > > > > > available > > > > > > > > > > > > > > > in a > > > > > > > > > > > > > > > second? I suppose this is a problem whenever > > you > > > > > have > > > > > > a > > > > > > > > > > > bottleneck > > > > > > > > > > > > > > > between > > > > > > > > > > > > > > > I/O and network threads or if you end up > > > > > significantly > > > > > > > > > > > > > > over-provisioning > > > > > > > > > > > > > > > one pool (both of which are hard to avoid). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > An alternative for CPU throttling would be to > use > > > > this > > > > > > api: > > > > > > > > > > > > > > > http://docs.oracle.com/javase/ > > > > > 1.5.0/docs/api/java/lang/ > > > > > > > > > > > > > > > management/ThreadMXBean.html# > > > getThreadCpuTime(long) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That would let you track actual CPU usage > across > > > the > > > > > > > network, > > > > > > > > > I/O > > > > > > > > > > > > > > threads, > > > > > > > > > > > > > > > and purgatory threads and look at it as a > > > percentage > > > > of > > > > > > > total > > > > > > > > > > > cores. > > > > > > > > > > > > I > > > > > > > > > > > > > > > think this fixes many problems in the > reliability > > > of > > > > > the > > > > > > > > > metric. > > > > > > > > > > > It's > > > > > > > > > > > > > > > meaning is slightly different as it is just CPU > > > (you > > > > > > don't > > > > > > > > get > > > > > > > > > > > > charged > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > time blocking on I/O) but that may be okay > > because > > > we > > > > > > > already > > > > > > > > > > have > > > > > > > > > > > a > > > > > > > > > > > > > > > throttle on I/O. The downside is I think it is > > > > possible > > > > > > > this > > > > > > > > > api > > > > > > > > > > > can > > > > > > > > > > > > be > > > > > > > > > > > > > > > disabled or isn't always available and it may > > also > > > be > > > > > > > > expensive > > > > > > > > > > > (also > > > > > > > > > > > > > > I've > > > > > > > > > > > > > > > never used it so not sure if it really works > the > > > way > > > > i > > > > > > > > think). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin < > > > > > > > > > > becket....@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If the purpose of the KIP is only to protect > > the > > > > > > cluster > > > > > > > > from > > > > > > > > > > > being > > > > > > > > > > > > > > > > overwhelmed by crazy clients and is not > > intended > > > to > > > > > > > address > > > > > > > > > > > > resource > > > > > > > > > > > > > > > > allocation problem among the clients, I am > > > > wondering > > > > > if > > > > > > > > using > > > > > > > > > > > > request > > > > > > > > > > > > > > > > handling time quota (CPU time quota) is a > > better > > > > > > option. > > > > > > > > Here > > > > > > > > > > are > > > > > > > > > > > > the > > > > > > > > > > > > > > > > reasons: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. request handling time quota has better > > > > protection. > > > > > > Say > > > > > > > > we > > > > > > > > > > have > > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > rate quota and set that to some value like > 100 > > > > > > > > requests/sec, > > > > > > > > > it > > > > > > > > > > > is > > > > > > > > > > > > > > > possible > > > > > > > > > > > > > > > > that some of the requests are very expensive > > > > actually > > > > > > > take > > > > > > > > a > > > > > > > > > > lot > > > > > > > > > > > of > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > handle. In that case a few clients may still > > > > occupy a > > > > > > lot > > > > > > > > of > > > > > > > > > > CPU > > > > > > > > > > > > time > > > > > > > > > > > > > > > even > > > > > > > > > > > > > > > > the request rate is low. Arguably we can > > > carefully > > > > > set > > > > > > > > > request > > > > > > > > > > > rate > > > > > > > > > > > > > > quota > > > > > > > > > > > > > > > > for each request and client id combination, > but > > > it > > > > > > could > > > > > > > > > still > > > > > > > > > > be > > > > > > > > > > > > > > tricky > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > get it right for everyone. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we use the request time handling quota, we > > can > > > > > > simply > > > > > > > > say > > > > > > > > > no > > > > > > > > > > > > > clients > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > take up to more than 30% of the total request > > > > > handling > > > > > > > > > capacity > > > > > > > > > > > > > > (measured > > > > > > > > > > > > > > > > by time), regardless of the difference among > > > > > different > > > > > > > > > requests > > > > > > > > > > > or > > > > > > > > > > > > > what > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > the client doing. In this case maybe we can > > quota > > > > all > > > > > > the > > > > > > > > > > > requests > > > > > > > > > > > > if > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > want to. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The main benefit of using request rate > limit > > > is > > > > > that > > > > > > > it > > > > > > > > > > seems > > > > > > > > > > > > more > > > > > > > > > > > > > > > > intuitive. It is true that it is probably > > easier > > > to > > > > > > > explain > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > > > user > > > > > > > > > > > > > > > > what does that mean. However, in practice it > > > looks > > > > > the > > > > > > > > impact > > > > > > > > > > of > > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > rate quota is not more quantifiable than the > > > > request > > > > > > > > handling > > > > > > > > > > > time > > > > > > > > > > > > > > quota. > > > > > > > > > > > > > > > > Unlike the byte rate quota, it is still > > difficult > > > > to > > > > > > > give a > > > > > > > > > > > number > > > > > > > > > > > > > > about > > > > > > > > > > > > > > > > impact of throughput or latency when a > request > > > rate > > > > > > quota > > > > > > > > is > > > > > > > > > > hit. > > > > > > > > > > > > So > > > > > > > > > > > > > it > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > not better than the request handling time > > quota. > > > In > > > > > > fact > > > > > > > I > > > > > > > > > feel > > > > > > > > > > > it > > > > > > > > > > > > is > > > > > > > > > > > > > > > > clearer to tell user that "you are limited > > > because > > > > > you > > > > > > > have > > > > > > > > > > taken > > > > > > > > > > > > 30% > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > the CPU time on the broker" than otherwise > > > > something > > > > > > like > > > > > > > > > "your > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > rate quota on metadata request has reached". > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps < > > > > > > > > j...@confluent.io > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this proposal makes a lot of sense > > > > > > (especially > > > > > > > > now > > > > > > > > > > that > > > > > > > > > > > > it > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > oriented around request rate) and fills the > > > > biggest > > > > > > > > > remaining > > > > > > > > > > > gap > > > > > > > > > > > > > in > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > multi-tenancy story. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think for intra-cluster communication > > > > > (StopReplica, > > > > > > > > etc) > > > > > > > > > we > > > > > > > > > > > > could > > > > > > > > > > > > > > > avoid > > > > > > > > > > > > > > > > > throttling entirely. You can secure or > > > otherwise > > > > > > > > lock-down > > > > > > > > > > the > > > > > > > > > > > > > > cluster > > > > > > > > > > > > > > > > > communication to avoid any unauthorized > > > external > > > > > > party > > > > > > > > from > > > > > > > > > > > > trying > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > initiate these requests. As a result we are > > as > > > > > likely > > > > > > > to > > > > > > > > > > cause > > > > > > > > > > > > > > problems > > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > solve them by throttling these, right? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm not so sure that we should exempt the > > > > consumer > > > > > > > > requests > > > > > > > > > > > such > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > heartbeat. It's true that if we throttle an > > > app's > > > > > > > > heartbeat > > > > > > > > > > > > > requests > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > cause it to fall out of its consumer group. > > > > However > > > > > > if > > > > > > > we > > > > > > > > > > don't > > > > > > > > > > > > > > > throttle > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > it may DDOS the cluster if the heartbeat > > > interval > > > > > is > > > > > > > set > > > > > > > > > > > > > incorrectly > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > some client in some language has a bug. I > > think > > > > the > > > > > > > > policy > > > > > > > > > > with > > > > > > > > > > > > > this > > > > > > > > > > > > > > > kind > > > > > > > > > > > > > > > > > of throttling is to protect the cluster > above > > > any > > > > > > > > > individual > > > > > > > > > > > app, > > > > > > > > > > > > > > > right? > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > think in general this should be okay since > > for > > > > most > > > > > > > > > > deployments > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > setting is meant as more of a safety > > > valve---that > > > > > is > > > > > > > > rather > > > > > > > > > > > than > > > > > > > > > > > > > set > > > > > > > > > > > > > > > > > something very close to what you expect to > > need > > > > > (say > > > > > > 2 > > > > > > > > > > req/sec > > > > > > > > > > > or > > > > > > > > > > > > > > > > whatever) > > > > > > > > > > > > > > > > > you would have something quite high (like > 100 > > > > > > req/sec) > > > > > > > > with > > > > > > > > > > > this > > > > > > > > > > > > > > meant > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > prevent a client gone crazy. I think when > > used > > > > this > > > > > > way > > > > > > > > > > > allowing > > > > > > > > > > > > > > those > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > be throttled would actually provide > > meaningful > > > > > > > > protection. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini > > > Sivaram < > > > > > > > > > > > > > > > rajinisiva...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have just created KIP-124 to introduce > > > > request > > > > > > rate > > > > > > > > > > quotas > > > > > > > > > > > to > > > > > > > > > > > > > > > Kafka: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/ > > > > > > > confluence/display/KAFKA/KIP- > > > > > > > > > > > > > > > > > > 124+-+Request+rate+quotas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The proposal is for a simple percentage > > > request > > > > > > > > handling > > > > > > > > > > time > > > > > > > > > > > > > quota > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > can be allocated to *<client-id>*, > *<user>* > > > or > > > > > > > *<user, > > > > > > > > > > > > > client-id>*. > > > > > > > > > > > > > > > > There > > > > > > > > > > > > > > > > > > are a few other suggestions also under > > > > "Rejected > > > > > > > > > > > alternatives". > > > > > > > > > > > > > > > > Feedback > > > > > > > > > > > > > > > > > > and suggestions are welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Guozhang > > > > > > > > > > > > > > >