Hi Joel Thanks for taking the time to look at this. Appreciated.
Regarding throttling on both leader and follower, this proposal covers a more general solution which can guarantee a quota, even when a rebalance operation produces an asymmetric profile of load. This means administrators don’t need to calculate the impact that a follower-only quota will have on the leaders they are fetching from. So for example where replica sizes are skewed or where a partial rebalance is required. Having said that, even with both leader and follower quotas, the use of additional threads is actually optional. There appear to be two general approaches (1) omit partitions from fetch requests (follower) / fetch responses (leader) when they exceed their quota (2) delay them, as the existing quota mechanism does, using separate fetchers. Both appear valid, but with slightly different design tradeoffs. The issue with approach (1) is that it departs somewhat from the existing quotas implementation, and must include a notion of fairness within, the now size-bounded, request and response. The issue with (2) is guaranteeing ordering of updates when replicas shift threads, but this is handled, for the most part, in the code today. I’ve updated the rejected alternatives section to make this a little clearer. B > On 8 Aug 2016, at 20:38, Joel Koshy <jjkosh...@gmail.com> wrote: > > Hi Ben, > > Thanks for the detailed write-up. So the proposal involves self-throttling > on the fetcher side and throttling at the leader. Can you elaborate on the > reasoning that is given on the wiki: *“The throttle is applied to both > leaders and followers. This allows the admin to exert strong guarantees on > the throttle limit".* Is there any reason why one or the other wouldn't be > sufficient. > > Specifically, if we were to only do self-throttling on the fetchers, we > could potentially avoid the additional replica fetchers right? i.e., the > replica fetchers would maintain its quota metrics as you proposed and each > (normal) replica fetch presents an opportunity to make progress for the > throttled partitions as long as their effective consumption rate is below > the quota limit. If it exceeds the consumption rate then don’t include the > throttled partitions in the subsequent fetch requests until the effective > consumption rate for those partitions returns to within the quota threshold. > > I have more questions on the proposal, but was more interested in the above > to see if it could simplify things a bit. > > Also, can you open up access to the google-doc that you link to? > > Thanks, > > Joel > > On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford <b...@confluent.io> wrote: > >> We’ve created KIP-73: Replication Quotas >> >> The idea is to allow an admin to throttle moving replicas. Full details >> are here: >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+ >> Replication+Quotas <https://cwiki.apache.org/conf >> luence/display/KAFKA/KIP-73+Replication+Quotas> >> >> Please take a look and let us know your thoughts. >> >> Thanks >> >> B >> >>