Hi,
     Cluster management tools are more generic and they are not aware of Kafka 
specific configs like broker.id.
Even if they are aware of broker.id's , they will be lost when a disk is lost. 
      Irrespective of these use cases, let's look at the problem in isolation.
1. disks are the most common failure case in Kafka clusters 
2. We are storing auto-generated broker.id on disks hence we lose this 
broker.id mapping when disks fail.
3. If we keep the previously generated broker.id mapping along with host on 
zookeeper it's easier to retrieve that mapping on a new host. This would reduce 
the reassignment step and allow us to just copy the data and start the new node 
with the previous broker.id
which is what the KIP is proposing. 
I want to understand what are your concerns in moving this mapping which 
already exists on disk to zookeeper? 

Thanks,
Harsha

On Fri, Mar 1, 2019, at 11:11 AM, Colin McCabe wrote:
> On Wed, Feb 27, 2019, at 14:12, Harsha wrote:
> > Hi Colin,
> >               What we want to is to preserve the broker.id so that we 
> > can do an offline rebuild of a broker. In our cases going through 
> > online Kafka replication to bring up, a failed node will put producer 
> > latencies at risk given the new broker will put all the other leaders 
> > busy with its replication requests. For an offline rebuild, we do not 
> > need to do rebalance as long as we can recover the broker.id
> >           Overall, irrespective of this use case we still want an 
> > ability to retrieve a broker.id for an existing host. This will make 
> > swapping in new hosts with failed hosts by keeping the existing 
> > hostname easier.
> 
> Thanks for the explanation.  Shouldn't this should be handled by the 
> cluster management tool, though?  Kafka doesn't include a mechanism for 
> re-creating nodes that failed.  That's up to kubernetes, or ansible, or 
> whatever cluster provisioning framework you have in place.  This feels 
> like the same kind of thing: managing how the cluster is provisioned.
> 
> best,
> Colin
> 
> > 
> > Thanks,
> > Harsha
> > On Wed, Feb 27, 2019, at 11:53 AM, Colin McCabe wrote:
> > > Hi Li,
> > > 
> > >  > The mechanism simplifies deployment because the same configuration can 
> > > be 
> > >  > used across all brokers, however, in a large system where disk failure 
> > > is 
> > >  > a norm, the meta file could often get lost, causing a new broker id 
> > > being 
> > >  > allocated. This is problematic because new broker id has no partition 
> > >  > assigned to it so it can’t do anything, while partitions assigned to 
> > > the 
> > >  > old one lose one replica
> > > 
> > > If all of the disks have failed, then the partitions will lose their 
> > > replicas no matter what, right?  If any of the disks is still around, 
> > > then there will be a meta file on the disk which contains the previous 
> > > broker ID.  So I'm not sure that we need to change anything here.
> > > 
> > > best,
> > > Colin
> > > 
> > > 
> > > On Tue, Feb 5, 2019, at 14:38, Li Kan wrote:
> > > > Hi, I have KIP-426, which is a small change on automatically determining
> > > > broker id when starting up. I am new to Kafka so there are a bunch of
> > > > design trade-offs that I might be missing or hard to decide, so I'd 
> > > > like to
> > > > get some suggestions on it. I'd expect (and open) to modify (or even
> > > > totally rewrite) the KIP based on suggestions. Thanks.
> > > > 
> > > > -- 
> > > > Best,
> > > > Kan
> > > >
> > >
> >
>

Reply via email to