Hi Eno,

A control plane needs to do this today because Kafka doesn't provide such 
mapping.
I am not sure why we want every control plane figure this out and rather let 
this mapping which exists today in Kafka at node level
on disk be at a global level in zookeeper.
If we implement this, any control plane will be much simpler and not all the 
different environments need to understand and re-implement this broker.id 
mapping.
I don't understand duplication either, which control-pane we are talking about? 

Irrespective which control pane a user ends up using I want to understand the 
concerns about a broker.id host mapping being available in zookeeper.  Broker 
id belongs to Kafka and not in control pane. 

Thanks,
Harsha


On Sat, Mar 2, 2019, at 3:50 AM, Eno Thereska wrote:
> Hi Harsha, Li Kan,
> 
> What Colin mentioned is what I see in practice as well (at AWS and our
> clusters). A control plane management tool decides the mapping
> hostname-broker ID and can change it as it sees fit as brokers fail and new
> ones are brought in. That control plane usually already has a database of
> sorts that keeps track of existing broker IDs. So this work would duplicate
> what that control plane already does. It could also lead to extra work if
> that control plane decides to do something different that what the mapping
> in Zookeeper has.
> 
> At a minimum I'd like to see the motivation expanded and a description of
> how the current cluster is managed that Li Kan has in mind.
> 
> Thanks
> Eno
> 
> On Sat, Mar 2, 2019 at 1:43 AM Harsha <ka...@harsha.io> wrote:
> 
> > Hi,
> >      Cluster management tools are more generic and they are not aware of
> > Kafka specific configs like broker.id.
> > Even if they are aware of broker.id's , they will be lost when a disk is
> > lost.
> >       Irrespective of these use cases, let's look at the problem in
> > isolation.
> > 1. disks are the most common failure case in Kafka clusters
> > 2. We are storing auto-generated broker.id on disks hence we lose this
> > broker.id mapping when disks fail.
> > 3. If we keep the previously generated broker.id mapping along with host
> > on zookeeper it's easier to retrieve that mapping on a new host. This would
> > reduce the reassignment step and allow us to just copy the data and start
> > the new node with the previous broker.id
> > which is what the KIP is proposing.
> > I want to understand what are your concerns in moving this mapping which
> > already exists on disk to zookeeper?
> >
> > Thanks,
> > Harsha
> >
> > On Fri, Mar 1, 2019, at 11:11 AM, Colin McCabe wrote:
> > > On Wed, Feb 27, 2019, at 14:12, Harsha wrote:
> > > > Hi Colin,
> > > >               What we want to is to preserve the broker.id so that we
> > > > can do an offline rebuild of a broker. In our cases going through
> > > > online Kafka replication to bring up, a failed node will put producer
> > > > latencies at risk given the new broker will put all the other leaders
> > > > busy with its replication requests. For an offline rebuild, we do not
> > > > need to do rebalance as long as we can recover the broker.id
> > > >           Overall, irrespective of this use case we still want an
> > > > ability to retrieve a broker.id for an existing host. This will make
> > > > swapping in new hosts with failed hosts by keeping the existing
> > > > hostname easier.
> > >
> > > Thanks for the explanation.  Shouldn't this should be handled by the
> > > cluster management tool, though?  Kafka doesn't include a mechanism for
> > > re-creating nodes that failed.  That's up to kubernetes, or ansible, or
> > > whatever cluster provisioning framework you have in place.  This feels
> > > like the same kind of thing: managing how the cluster is provisioned.
> > >
> > > best,
> > > Colin
> > >
> > > >
> > > > Thanks,
> > > > Harsha
> > > > On Wed, Feb 27, 2019, at 11:53 AM, Colin McCabe wrote:
> > > > > Hi Li,
> > > > >
> > > > >  > The mechanism simplifies deployment because the same
> > configuration can be
> > > > >  > used across all brokers, however, in a large system where disk
> > failure is
> > > > >  > a norm, the meta file could often get lost, causing a new broker
> > id being
> > > > >  > allocated. This is problematic because new broker id has no
> > partition
> > > > >  > assigned to it so it can’t do anything, while partitions assigned
> > to the
> > > > >  > old one lose one replica
> > > > >
> > > > > If all of the disks have failed, then the partitions will lose their
> > > > > replicas no matter what, right?  If any of the disks is still
> > around,
> > > > > then there will be a meta file on the disk which contains the
> > previous
> > > > > broker ID.  So I'm not sure that we need to change anything here.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Feb 5, 2019, at 14:38, Li Kan wrote:
> > > > > > Hi, I have KIP-426, which is a small change on automatically
> > determining
> > > > > > broker id when starting up. I am new to Kafka so there are a bunch
> > of
> > > > > > design trade-offs that I might be missing or hard to decide, so
> > I'd like to
> > > > > > get some suggestions on it. I'd expect (and open) to modify (or
> > even
> > > > > > totally rewrite) the KIP based on suggestions. Thanks.
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Kan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to