Re: [DISCUSS] KIP-73: Replication Quotas

Jun Rao Tue, 09 Aug 2016 09:45:16 -0700

Just to elaborate on what Ben said why we need throttling on both the
leader and the follower side.


If we only have throttling on the follower side, consider a case that we
add 5 more new brokers and want to move some replicas from existing brokers
over to those 5 brokers. Each of those broker is going to fetch data from
all existing brokers. Then, it's possible that the aggregated fetch load
from those 5 brokers on a particular existing broker exceeds its outgoing
network bandwidth, even though the inbounding traffic on each of those 5
brokers is bounded.

If we only have throttling on the leader side, consider the same example
above. It's possible for the incoming traffic to each of those 5 brokers to
exceed its network bandwidth since it is fetching data from all existing
brokers.

So, being able to set a quota on both the follower and the leader side
protects both cases.

Thanks,

Jun

On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford <b...@confluent.io> wrote:

> Hi Joel
>
> Thanks for taking the time to look at this. Appreciated.
>
> Regarding throttling on both leader and follower, this proposal covers a
> more general solution which can guarantee a quota, even when a rebalance
> operation produces an asymmetric profile of load. This means administrators
> don’t need to calculate the impact that a follower-only quota will have on
> the leaders they are fetching from. So for example where replica sizes are
> skewed or where a partial rebalance is required.
>
> Having said that, even with both leader and follower quotas, the use of
> additional threads is actually optional. There appear to be two general
> approaches (1) omit partitions from fetch requests (follower) / fetch
> responses (leader) when they exceed their quota (2) delay them, as the
> existing quota mechanism does, using separate fetchers. Both appear valid,
> but with slightly different design tradeoffs.
>
> The issue with approach (1) is that it departs somewhat from the existing
> quotas implementation, and must include a notion of fairness within, the
> now size-bounded, request and response. The issue with (2) is guaranteeing
> ordering of updates when replicas shift threads, but this is handled, for
> the most part, in the code today.
>
> I’ve updated the rejected alternatives section to make this a little
> clearer.
>
> B
>
>
>
> > On 8 Aug 2016, at 20:38, Joel Koshy <jjkosh...@gmail.com> wrote:
> >
> > Hi Ben,
> >
> > Thanks for the detailed write-up. So the proposal involves
> self-throttling
> > on the fetcher side and throttling at the leader. Can you elaborate on
> the
> > reasoning that is given on the wiki: *“The throttle is applied to both
> > leaders and followers. This allows the admin to exert strong guarantees
> on
> > the throttle limit".* Is there any reason why one or the other wouldn't
> be
> > sufficient.
> >
> > Specifically, if we were to only do self-throttling on the fetchers, we
> > could potentially avoid the additional replica fetchers right? i.e., the
> > replica fetchers would maintain its quota metrics as you proposed and
> each
> > (normal) replica fetch presents an opportunity to make progress for the
> > throttled partitions as long as their effective consumption rate is below
> > the quota limit. If it exceeds the consumption rate then don’t include
> the
> > throttled partitions in the subsequent fetch requests until the effective
> > consumption rate for those partitions returns to within the quota
> threshold.
> >
> > I have more questions on the proposal, but was more interested in the
> above
> > to see if it could simplify things a bit.
> >
> > Also, can you open up access to the google-doc that you link to?
> >
> > Thanks,
> >
> > Joel
> >
> > On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford <b...@confluent.io> wrote:
> >
> >> We’ve created KIP-73: Replication Quotas
> >>
> >> The idea is to allow an admin to throttle moving replicas. Full details
> >> are here:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+
> >> Replication+Quotas <https://cwiki.apache.org/conf
> >> luence/display/KAFKA/KIP-73+Replication+Quotas>
> >>
> >> Please take a look and let us know your thoughts.
> >>
> >> Thanks
> >>
> >> B
> >>
> >>
>
>

Re: [DISCUSS] KIP-73: Replication Quotas

Reply via email to