Hi Harsha, Li Kan, What Colin mentioned is what I see in practice as well (at AWS and our clusters). A control plane management tool decides the mapping hostname-broker ID and can change it as it sees fit as brokers fail and new ones are brought in. That control plane usually already has a database of sorts that keeps track of existing broker IDs. So this work would duplicate what that control plane already does. It could also lead to extra work if that control plane decides to do something different that what the mapping in Zookeeper has.
At a minimum I'd like to see the motivation expanded and a description of how the current cluster is managed that Li Kan has in mind. Thanks Eno On Sat, Mar 2, 2019 at 1:43 AM Harsha <ka...@harsha.io> wrote: > Hi, > Cluster management tools are more generic and they are not aware of > Kafka specific configs like broker.id. > Even if they are aware of broker.id's , they will be lost when a disk is > lost. > Irrespective of these use cases, let's look at the problem in > isolation. > 1. disks are the most common failure case in Kafka clusters > 2. We are storing auto-generated broker.id on disks hence we lose this > broker.id mapping when disks fail. > 3. If we keep the previously generated broker.id mapping along with host > on zookeeper it's easier to retrieve that mapping on a new host. This would > reduce the reassignment step and allow us to just copy the data and start > the new node with the previous broker.id > which is what the KIP is proposing. > I want to understand what are your concerns in moving this mapping which > already exists on disk to zookeeper? > > Thanks, > Harsha > > On Fri, Mar 1, 2019, at 11:11 AM, Colin McCabe wrote: > > On Wed, Feb 27, 2019, at 14:12, Harsha wrote: > > > Hi Colin, > > > What we want to is to preserve the broker.id so that we > > > can do an offline rebuild of a broker. In our cases going through > > > online Kafka replication to bring up, a failed node will put producer > > > latencies at risk given the new broker will put all the other leaders > > > busy with its replication requests. For an offline rebuild, we do not > > > need to do rebalance as long as we can recover the broker.id > > > Overall, irrespective of this use case we still want an > > > ability to retrieve a broker.id for an existing host. This will make > > > swapping in new hosts with failed hosts by keeping the existing > > > hostname easier. > > > > Thanks for the explanation. Shouldn't this should be handled by the > > cluster management tool, though? Kafka doesn't include a mechanism for > > re-creating nodes that failed. That's up to kubernetes, or ansible, or > > whatever cluster provisioning framework you have in place. This feels > > like the same kind of thing: managing how the cluster is provisioned. > > > > best, > > Colin > > > > > > > > Thanks, > > > Harsha > > > On Wed, Feb 27, 2019, at 11:53 AM, Colin McCabe wrote: > > > > Hi Li, > > > > > > > > > The mechanism simplifies deployment because the same > configuration can be > > > > > used across all brokers, however, in a large system where disk > failure is > > > > > a norm, the meta file could often get lost, causing a new broker > id being > > > > > allocated. This is problematic because new broker id has no > partition > > > > > assigned to it so it can’t do anything, while partitions assigned > to the > > > > > old one lose one replica > > > > > > > > If all of the disks have failed, then the partitions will lose their > > > > replicas no matter what, right? If any of the disks is still > around, > > > > then there will be a meta file on the disk which contains the > previous > > > > broker ID. So I'm not sure that we need to change anything here. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > On Tue, Feb 5, 2019, at 14:38, Li Kan wrote: > > > > > Hi, I have KIP-426, which is a small change on automatically > determining > > > > > broker id when starting up. I am new to Kafka so there are a bunch > of > > > > > design trade-offs that I might be missing or hard to decide, so > I'd like to > > > > > get some suggestions on it. I'd expect (and open) to modify (or > even > > > > > totally rewrite) the KIP based on suggestions. Thanks. > > > > > > > > > > -- > > > > > Best, > > > > > Kan > > > > > > > > > > > > > > >