It's also worth mentioning why new slave machines need to truncate back to a known good point.
When a new server joins the cluster and already has some data on disk we cannot blindly trust its log as it may have messages that were never committed (for example if it was the master and then crashed suddenly). This would lead to a situation where the replicas disagree about the message at a given offset and are numbering things differently. To ensure consistency you must truncate the new server back to a known safe point and re-sync from the caught up servers to ensure consistency. This is always a safe operation as long as one server remains alive, the dilemma you describe and Jun commented on occurs when you bring back an empty server as the only server which of course makes it the master. -Jay On Fri, Jun 21, 2013 at 8:24 PM, Jun Rao <jun...@gmail.com> wrote: > Hi, Bob, > > Thanks for reporting this. Yes, this is the current behavior when all > brokers fail. Whichever broker comes back first becomes the new leader and > is the source of truth. This increases availability. However, previously > committed data can be lost. This is what we call unclean leader elections. > Another option is instead to wait until a broker in in-sync replica set to > come back before electing a new leader. This will preserve all committed > data at the expense of availability. The application can configure the > system with the appropriate option based on its need. > > Thanks, > > Jun > > > On Fri, Jun 21, 2013 at 4:08 PM, Bob Jervis <bjer...@visibletechnologies.com >> wrote: > >> I wanted to send this out because we saw this in some testing we were >> doing and wanted to advise the community of something to watch for in 0.8 >> HA support. >> >> We have a two machine cluster with replication factor 2. We took one >> machine offline and re-formatted the disk. We re-installed the Kafka >> software, but did not recreate any of the local disk files. The intention >> was to simply re-start the broker process, but due to an error in the >> network config that took some time to diagnose, we ended up with the both >> machines' brokers down. >> >> When we fixed the network config and restarted the brokers, we happened to >> start the broker on the rebuilt machine first. The net result was when the >> healthy broker came back online, the rebuilt machine was already the leader >> and because of the Zookeeper state, it force the healthy broker to delete >> all of its topic data, thus wiping out the entire contents of the cluster. >> >> We are instituting operations procedures to safeguard against this >> scenario in the future (and fortunately we only blew away a test cluster), >> but this was a bit of a nasty surprise for a Friday. >> >> Bob Jervis >> Visibletechnologies >> >>