Re: Kafka 0.8 Failover Behavior

Vadim Keylis Thu, 27 Jun 2013 17:05:17 -0700

Jun,
Does kafka provides ability to configure broker to be in in-sync before
become availalble?
Is it possible in case of all brokers crash to find out which node has the
most recent data to initiate proper startup procedure?


Thanks,
Vadim


On Fri, Jun 21, 2013 at 8:24 PM, Jun Rao <jun...@gmail.com> wrote:

> Hi, Bob,
>
> Thanks for reporting this. Yes, this is the current behavior when all
> brokers fail. Whichever broker comes back first becomes the new leader and
> is the source of truth. This increases availability. However, previously
> committed data can be lost. This is what we call unclean leader elections.
> Another option is instead to wait until a broker in in-sync replica set to
> come back before electing a new leader. This will preserve all committed
> data at the expense of availability. The application can configure the
> system with the appropriate option based on its need.
>
> Thanks,
>
> Jun
>
>
> On Fri, Jun 21, 2013 at 4:08 PM, Bob Jervis <
> bjer...@visibletechnologies.com
> > wrote:
>
> > I wanted to send this out because we saw this in some testing we were
> > doing and wanted to advise the community of something to watch for in 0.8
> > HA support.
> >
> > We have a two machine cluster with replication factor 2.  We took one
> > machine offline and re-formatted the disk.  We re-installed the Kafka
> > software, but did not recreate any of the local disk files.  The
> intention
> > was to simply re-start the broker process, but due to an error in the
> > network config that took some time to diagnose, we ended up with the both
> > machines' brokers down.
> >
> > When we fixed the network config and restarted the brokers, we happened
> to
> > start the broker on the rebuilt machine first.  The net result was when
> the
> > healthy broker came back online, the rebuilt machine was already the
> leader
> > and because of the Zookeeper state, it force the healthy broker to delete
> > all of its topic data, thus wiping out the entire contents of the
> cluster.
> >
> > We are instituting operations procedures to safeguard against this
> > scenario in the future (and fortunately we only blew away a test
> cluster),
> > but this was a bit of a nasty surprise for a Friday.
> >
> > Bob Jervis
> > Visibletechnologies
> >
> >
>

Re: Kafka 0.8 Failover Behavior

Reply via email to