Hi Neha, Thanks for the detailed reply and apologies for my late response. I do have a few comments.
1. Replica Throttling - I agree this is rather important to get done. However, it may also be argued that this problem is orthogonal. We do not have these protections currently yet we do run partition reassignment fairly often. Having said that, I'm perfectly happy to tackle KIP-46 after this problem is solved. I understand it is actively being discussed in KAFKA-1464. 2. Pluggable policies - Can you elaborate on the need for pluggable policies in the partition reassignment tool? Even if we make it pluggable to begin with, this needs to ship with a default policy that makes sense for most users. IMO, partition count is the most intuitive default and is analogous to how we stripe partitions for new topics. 3. Even if the trigger were fully manual (as it is now), we could still have the controller generate the assignment as per a configured policy i.e. effectively the tool is built into Kafka itself. Following this approach to begin with makes it easier to fully automate in the future since we will only need to automate the trigger later. Aditya On Wed, Feb 3, 2016 at 1:57 PM, Neha Narkhede <n...@confluent.io> wrote: > Adi, > > Thanks for the write-up. Here are my thoughts: > > I think you are suggesting a way of automating resurrecting a topic’s > replication factor in the presence of a specific scenario: in the event of > permanent broker failures. I agree that the partition reassignment > mechanism should be used to add replicas when they are lost to permanent > broker failures. But I think the KIP probably chews off more than we can > digest. > > Before we automate detection of permanent broker failures and have the > controller mitigate through automatic data balancing, I’d like to point out > that our current difficulty is not that but the ability to generate a > workable partition assignment for rebalancing data in a cluster. > > There are 2 problems with partition rebalancing today: > > 1. Lack of replica throttling for balancing data: In the absence of > replica throttling, even if you come up with an assignment that might be > workable, it isn’t practical to kick it off without worrying about > bringing > the entire cluster down. I don’t think the hack of moving partitions in > batches is effective as it at-best a best guess. > 2. Lack of support for policies in the rebalance tool that automatically > generate a workable partition assignment: There is no easy way to > generate > a partition reassignment JSON file. An example of a policy is “end up > with > an equal number of partitions on every broker while minimizing data > movement”. There might be other policies that might make sense, we’d > have > to experiment. > > Broadly speaking, the data balancing problem is comprised of 3 parts: > > 1. Trigger: An event that triggers data balancing to take place. KIP-46 > suggests a specific trigger and that is permanent broker failure. But > there > might be several other events that might make sense — Cluster expansion, > decommissioning brokers, data imbalance > 2. Policy: Given a set of constraints, generate a target partition > assignment that can be executed when triggered. > 3. Mechanism: Given a partition assignment, make the state changes and > actually move the data until the target assignment is achieved. > > Currently, the trigger is manual through the rebalance tool, there is no > support for any viable policy today and we have a built-in mechanism that, > given a policy and upon a trigger, moves data in a cluster but does not > support throttling. > > Given that both the policy and the throttling improvement to the mechanism > are hard problems and given our past experience of operationalizing > partition reassignment (required months of testing before we got it right), > I strongly recommend attacking this problem in stages. I think a more > practical approach would be to add the concept of pluggable policies in the > rebalance tool, implement a practical policy that generates a workable > partition assignment upon triggering the tool and improve the mechanism to > support throttling so that a given policy can succeed without manual > intervention. If we solved these problems first, the rebalance tool would > be much more accessible to Kafka users and operators. > > Assuming that we do this, the problem that KIP-46 aims to solve becomes > much easier. You can separate the detection of permanent broker failures > (trigger) from the mitigation (above-mentioned improvements to data > balancing). The latter will be a native capability in Kafka. Detecting > permanent hardware failures is much easily done via an external script that > uses a simple health check. (Part 1 of KIP-46). > > I agree that it will be great to *eventually* be able to fully automate > both the trigger as well as the policies while also improving the > mechanism. But I’m highly skeptical of big-bang approaches that go from a > completely manual and cumbersome process to a fully automated one, > especially when that involves large-scale data movement in a running > cluster. Once we stabilize these changes and feel confident that they work, > we can push the policy into the controller and have it automatically be > triggered based on different events. > > Thanks, > Neha > > On Tue, Feb 2, 2016 at 6:13 PM, Aditya Auradkar < > aaurad...@linkedin.com.invalid> wrote: > > > Hey everyone, > > > > I just created a kip to discuss automated replica reassignment when we > lose > > a broker in the cluster. > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-46%3A+Self+Healing+Kafka > > > > Any feedback is welcome. > > > > Thanks, > > Aditya > > > > > > -- > Thanks, > Neha >