Nice write up Ben. I agree with Joel for keeping this simple by excluding the partitions from the fetch request/response when the quota is violated at the follower or leader instead of having a separate set of threads for handling the quota and non quota cases. Even though its different from the current quota implementation it should be OK since its internal to brokers and can be handled by tuning the quota configs for it appropriately by the admins.
Also can you elaborate with an example how this would be handled : *guaranteeing ordering of updates when replicas shift threads* Thanks, Mayuresh On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy <jjkosh...@gmail.com> wrote: > On the need for both leader/follower throttling: that makes sense - thanks > for clarifying. For completeness, can we add this detail to the doc - say, > after the quote that I pasted earlier? > > From an implementation perspective though: I’m still interested in the > simplicity of not having to add separate replica fetchers, delay queue on > the leader, and “move” partitions from the throttled replica fetchers to > the regular replica fetchers once caught up. > > Instead, I think it would work and be simpler to include or exclude the > partitions in the fetch request from the follower and fetch response from > the leader when the quota is violated. The issue of fairness that Ben noted > may be a wash between the two options (that Ben wrote in his email). With > the default quota delay mechanism, partitions get delayed essentially at > random - i.e., whoever fetches at the time of quota violation gets delayed > at the leader. So we can adopt a similar policy in choosing to truncate > partitions in fetch responses. i.e., if at the time of handling the fetch > the “effect” replication rate exceeds the quota then either empty or > truncate those partitions from the response. (BTW effect replication is > your terminology in the wiki - i.e., replication due to partition > reassignment, adding brokers, etc.) > > While this may be slightly different from the existing quota mechanism I > think the difference is small (since we would reuse the quota manager at > worst with some refactoring) and will be internal to the broker. > > So I guess the question is if this alternative is simpler enough and > equally functional to not go with dedicated throttled replica fetchers. > > On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao <j...@confluent.io> wrote: > > > Just to elaborate on what Ben said why we need throttling on both the > > leader and the follower side. > > > > If we only have throttling on the follower side, consider a case that we > > add 5 more new brokers and want to move some replicas from existing > brokers > > over to those 5 brokers. Each of those broker is going to fetch data from > > all existing brokers. Then, it's possible that the aggregated fetch load > > from those 5 brokers on a particular existing broker exceeds its outgoing > > network bandwidth, even though the inbounding traffic on each of those 5 > > brokers is bounded. > > > > If we only have throttling on the leader side, consider the same example > > above. It's possible for the incoming traffic to each of those 5 brokers > to > > exceed its network bandwidth since it is fetching data from all existing > > brokers. > > > > So, being able to set a quota on both the follower and the leader side > > protects both cases. > > > > Thanks, > > > > Jun > > > > On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford <b...@confluent.io> wrote: > > > > > Hi Joel > > > > > > Thanks for taking the time to look at this. Appreciated. > > > > > > Regarding throttling on both leader and follower, this proposal covers > a > > > more general solution which can guarantee a quota, even when a > rebalance > > > operation produces an asymmetric profile of load. This means > > administrators > > > don’t need to calculate the impact that a follower-only quota will have > > on > > > the leaders they are fetching from. So for example where replica sizes > > are > > > skewed or where a partial rebalance is required. > > > > > > Having said that, even with both leader and follower quotas, the use of > > > additional threads is actually optional. There appear to be two general > > > approaches (1) omit partitions from fetch requests (follower) / fetch > > > responses (leader) when they exceed their quota (2) delay them, as the > > > existing quota mechanism does, using separate fetchers. Both appear > > valid, > > > but with slightly different design tradeoffs. > > > > > > The issue with approach (1) is that it departs somewhat from the > existing > > > quotas implementation, and must include a notion of fairness within, > the > > > now size-bounded, request and response. The issue with (2) is > > guaranteeing > > > ordering of updates when replicas shift threads, but this is handled, > for > > > the most part, in the code today. > > > > > > I’ve updated the rejected alternatives section to make this a little > > > clearer. > > > > > > B > > > > > > > > > > > > > On 8 Aug 2016, at 20:38, Joel Koshy <jjkosh...@gmail.com> wrote: > > > > > > > > Hi Ben, > > > > > > > > Thanks for the detailed write-up. So the proposal involves > > > self-throttling > > > > on the fetcher side and throttling at the leader. Can you elaborate > on > > > the > > > > reasoning that is given on the wiki: *“The throttle is applied to > both > > > > leaders and followers. This allows the admin to exert strong > guarantees > > > on > > > > the throttle limit".* Is there any reason why one or the other > wouldn't > > > be > > > > sufficient. > > > > > > > > Specifically, if we were to only do self-throttling on the fetchers, > we > > > > could potentially avoid the additional replica fetchers right? i.e., > > the > > > > replica fetchers would maintain its quota metrics as you proposed and > > > each > > > > (normal) replica fetch presents an opportunity to make progress for > the > > > > throttled partitions as long as their effective consumption rate is > > below > > > > the quota limit. If it exceeds the consumption rate then don’t > include > > > the > > > > throttled partitions in the subsequent fetch requests until the > > effective > > > > consumption rate for those partitions returns to within the quota > > > threshold. > > > > > > > > I have more questions on the proposal, but was more interested in the > > > above > > > > to see if it could simplify things a bit. > > > > > > > > Also, can you open up access to the google-doc that you link to? > > > > > > > > Thanks, > > > > > > > > Joel > > > > > > > > On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford <b...@confluent.io> > wrote: > > > > > > > >> We’ve created KIP-73: Replication Quotas > > > >> > > > >> The idea is to allow an admin to throttle moving replicas. Full > > details > > > >> are here: > > > >> > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+ > > > >> Replication+Quotas <https://cwiki.apache.org/conf > > > >> luence/display/KAFKA/KIP-73+Replication+Quotas> > > > >> > > > >> Please take a look and let us know your thoughts. > > > >> > > > >> Thanks > > > >> > > > >> B > > > >> > > > >> > > > > > > > > > -- -Regards, Mayuresh R. Gharat (862) 250-7125