0.8 best practices for migrating / electing leaders in failure situations?

Scott Clasen Fri, 22 Mar 2013 10:47:27 -0700

What would the recommended practice be for the following scenarios?

Running on EC2, ephemperal disks only for kafka.


There are 3 kafka servers. The broker ids are always increasing. If a
broker dies its never coming back.

All topics have a replication factor of 3.

* Scenario 1:  BrokerID 1,2,3   Broker 2 dies.

Recover by:

Boot another: BrokerID 4
?? run bin/kafka-reassign-partitions.sh   for any topic+partition and
replace brokerid 2 with brokerid 4
?? anything else to do to cause messages to be replicated to 4??

NOTE: This appears to work but not positive 4 got messages replicated to it.

* Scenario 2: BrokerID 1,2,3 Catastrophic failure 1,2,3 die but ZK still
there.

Messages obviously lost.
Recover to a functional state by:

Boot 3 more: 4,5 6
?? run bin/kafka-reassign-partitions.sh  for all topics/partitions, swap
1,2,3 for 4,5,6?
?? rin bin/kafka-preferred-replica-election.sh for all topics/partitions
?? anything else to do to allow producers to start sending successfully??


NOTE: I had some trouble with scenario 2. Will try to reproduce and open a
ticket, if in fact my procedures for scenario 2 are correct, and I still
cant get to a good state.

0.8 best practices for migrating / electing leaders in failure situations?

Reply via email to