Hi Joel

Thanks for taking the time to look at this. Appreciated. 

Regarding throttling on both leader and follower, this proposal covers a more 
general solution which can guarantee a quota, even when a rebalance operation 
produces an asymmetric profile of load. This means administrators don’t need to 
calculate the impact that a follower-only quota will have on the leaders they 
are fetching from. So for example where replica sizes are skewed or where a 
partial rebalance is required.

Having said that, even with both leader and follower quotas, the use of 
additional threads is actually optional. There appear to be two general 
approaches (1) omit partitions from fetch requests (follower) / fetch responses 
(leader) when they exceed their quota (2) delay them, as the existing quota 
mechanism does, using separate fetchers. Both appear valid, but with slightly 
different design tradeoffs. 

The issue with approach (1) is that it departs somewhat from the existing 
quotas implementation, and must include a notion of fairness within, the now 
size-bounded, request and response. The issue with (2) is guaranteeing ordering 
of updates when replicas shift threads, but this is handled, for the most part, 
in the code today. 

I’ve updated the rejected alternatives section to make this a little clearer. 

B



> On 8 Aug 2016, at 20:38, Joel Koshy <jjkosh...@gmail.com> wrote:
> 
> Hi Ben,
> 
> Thanks for the detailed write-up. So the proposal involves self-throttling
> on the fetcher side and throttling at the leader. Can you elaborate on the
> reasoning that is given on the wiki: *“The throttle is applied to both
> leaders and followers. This allows the admin to exert strong guarantees on
> the throttle limit".* Is there any reason why one or the other wouldn't be
> sufficient.
> 
> Specifically, if we were to only do self-throttling on the fetchers, we
> could potentially avoid the additional replica fetchers right? i.e., the
> replica fetchers would maintain its quota metrics as you proposed and each
> (normal) replica fetch presents an opportunity to make progress for the
> throttled partitions as long as their effective consumption rate is below
> the quota limit. If it exceeds the consumption rate then don’t include the
> throttled partitions in the subsequent fetch requests until the effective
> consumption rate for those partitions returns to within the quota threshold.
> 
> I have more questions on the proposal, but was more interested in the above
> to see if it could simplify things a bit.
> 
> Also, can you open up access to the google-doc that you link to?
> 
> Thanks,
> 
> Joel
> 
> On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford <b...@confluent.io> wrote:
> 
>> We’ve created KIP-73: Replication Quotas
>> 
>> The idea is to allow an admin to throttle moving replicas. Full details
>> are here:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+
>> Replication+Quotas <https://cwiki.apache.org/conf
>> luence/display/KAFKA/KIP-73+Replication+Quotas>
>> 
>> Please take a look and let us know your thoughts.
>> 
>> Thanks
>> 
>> B
>> 
>> 

Reply via email to