This config will only be applied to those replicas which are reassigning and not yet in ISR. When they become ISR then reassignment throttling stops altogether and won't apply when they fall out of ISR. Specifically the validity of the config spans from the point when a reassignment is propagated by the adding_replicas field in the LeaderAndIsr request until the broker gets another LeaderAndIsr request saying that the new replica is added and in ISR. Furthermore the config will be applied only the actual leader and follower so if the leader changes in the meanwhile the throttling changes with it (again based on the LeaderAndIsr requests).
For instance when a new broker is added to offload some partitions there, it will be safer to use this config instead of general fetch throttling for this very reason: when an existing partition that is being reassigned falls out of ISR then it will be propagated via the LeaderAndIsr request so throttling also changes. This removes the need for changing the configs manually and would give an easy way for people to configure throttling yet would make better efforts to not throttle what's not needed to be throttled (the replica which is falling out of ISR). Viktor On Fri, Dec 6, 2019 at 5:12 PM Ismael Juma <ism...@juma.me.uk> wrote: > My concern is that we're very focused on reassignment where I think users > enable throttling to avoid overwhelming brokers with replica catch up > traffic (typically disk and/or bandwidth). The current approach achieves > that by not throttling ISR replication. > > The downside is that when a broker falls out of the ISR, it may suddenly > get throttled and never catch up. However, if the throttle can cause this > kind of issue, then it's broken for replicas being reassigned too, so one > could say that it's a configuration error. > > Do we have specific scenarios that would be solved by the proposed change? > > Ismael > > On Fri, Dec 6, 2019 at 2:26 AM Viktor Somogyi-Vass < > viktorsomo...@gmail.com> > wrote: > > > Thanks for the question. I think it depends on how the user will try to > fix > > it. > > - If they just replace the disk then I think it shouldn't count as a > > reassignment and should be allocated under the normal replication quotas. > > In this case there is no reassignment going on as far as I can tell, the > > broker shuts down serving replicas from that dir/disk, notifies the > > controller which changes the leadership. When the disk is fixed the > broker > > will be restarted to pick up the changes and it starts the replication > from > > the current leader. > > - If the user reassigns the partitions to other brokers then it will fall > > under the reassignment traffic. > > Also if the user moves a partition to a different disk it would also > count > > as normal replication as it technically not a reassignment but an > > alter-replica-dir event but it's still done with the reassignment tool, > so > > I'd keep the current functionality of the > > --replica-alter-log-dirs-throttle. > > Is this aligned with your thinking? > > > > Viktor > > > > On Wed, Dec 4, 2019 at 2:47 PM Ismael Juma <isma...@gmail.com> wrote: > > > > > Thanks Viktor. How do we intend to handle the case where a broker loses > > its > > > disk and has to catch up from the beginning? > > > > > > Ismael > > > > > > On Wed, Dec 4, 2019, 4:31 AM Viktor Somogyi-Vass < > > viktorsomo...@gmail.com> > > > wrote: > > > > > > > Thanks for the notice Ismael, KAFKA-4313 fixed this issue indeed. > I've > > > > updated the KIP. > > > > > > > > Viktor > > > > > > > > On Tue, Dec 3, 2019 at 3:28 PM Ismael Juma <ism...@juma.me.uk> > wrote: > > > > > > > > > Hi Viktor, > > > > > > > > > > The KIP states: > > > > > > > > > > "KIP-73 > > > > > < > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas > > > > > > > > > > > added > > > > > quotas for replication but it doesn't separate normal replication > > > traffic > > > > > from reassignment. So a user is able to specify the partition and > the > > > > > throttle rate but it will be applied to both ISR and non-ISR > traffic" > > > > > > > > > > This is not true, ISR traffic is not throttled. > > > > > > > > > > Ismael > > > > > > > > > > On Thu, Oct 24, 2019 at 5:38 AM Viktor Somogyi-Vass < > > > > > viktorsomo...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi People, > > > > > > > > > > > > I've created a KIP to improve replication quotas by handling > > > > reassignment > > > > > > related throttling as a separate case with its own configurable > > > limits > > > > > and > > > > > > change the kafka-reassign-partitions tool to use these new > configs > > > > going > > > > > > forward. > > > > > > Please have a look, I'd be happy to receive any feedback and > answer > > > > > > all your questions. > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling > > > > > > > > > > > > Thanks, > > > > > > Viktor > > > > > > > > > > > > > > > > > > > > >