Re: 0.8 best practices for migrating / electing leaders in failure situations?

Scott Clasen Fri, 22 Mar 2013 14:17:52 -0700

Thanks Neha-

To Clarify...

*In scenario => 1 will the new broker get all messages on the other brokers
replicated to it?

*In Scenario 2 => it is clear that the data is gone, but I still need
producers to be able to send and consumers to receive on the same topic. In
my testing today I was unable to do that as I kept getting errors...so if i
was doing the correct steps it seems there is a bug here, basically the
"second-cluster-topic" topic is unusable after all 3 brokers crash, and 3
more are booted to replace them.  Something not quite correct in zookeeper?

Like so

./bin/kafka-reassign-partitions.sh --zookeeper ... --path-to-json-file
reassign.json

kafka.common.LeaderNotAvailableException: Leader not available for topic
second-cluster-topic partition 0
at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:120)
at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:103)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
at scala.collection.immutable.List.map(List.scala:45)
at
kafka.admin.AdminUtils$.kafka$admin$AdminUtils$$fetchTopicMetadataFromZk(AdminUtils.scala:103)
at kafka.admin.AdminUtils$.fetchTopicMetadataFromZk(AdminUtils.scala:92)
at kafka.admin.ListTopicCommand$.showTopic(ListTopicCommand.scala:80)
at
kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:66)
at
kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:65)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at kafka.admin.ListTopicCommand$.main(ListTopicCommand.scala:65)
at kafka.admin.ListTopicCommand.main(ListTopicCommand.scala)
Caused by: kafka.common.LeaderNotAvailableException: No leader exists for
partition 0
at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:117)
... 16 more
topic: second-cluster-topic

./bin/kafka-preferred-replica-election.sh  --zookeeper...
--path-to-json-file elect.json

....[2013-03-22 10:24:20,706] INFO Created preferred replica election path
with { "partitions":[ { "partition":0, "topic":"first-cluster-topic" }, {
"partition":0, "topic":"second-cluster-topic" } ], "version":1 }
(kafka.admin.PreferredReplicaLeaderElectionCommand$)

./bin/kafka-list-topic.sh  --zookeeper ... --topic second-cluster-topic

2013-03-22 10:24:30,869] ERROR Error while fetching metadata for partition
[second-cluster-topic,0] (kafka.admin.AdminUtils$)
kafka.common.LeaderNotAvailableException: Leader not available for topic
second-cluster-topic partition 0
at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:120)
at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:103)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
at scala.collection.immutable.List.map(List.scala:45)
at
kafka.admin.AdminUtils$.kafka$admin$AdminUtils$$fetchTopicMetadataFromZk(AdminUtils.scala:103)
at kafka.admin.AdminUtils$.fetchTopicMetadataFromZk(AdminUtils.scala:92)
at kafka.admin.ListTopicCommand$.showTopic(ListTopicCommand.scala:80)
at
kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:66)
at
kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:65)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at kafka.admin.ListTopicCommand$.main(ListTopicCommand.scala:65)
at kafka.admin.ListTopicCommand.main(ListTopicCommand.scala)
Caused by: kafka.common.LeaderNotAvailableException: No leader exists for
partition 0
at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:117)
... 16 more

On Fri, Mar 22, 2013 at 1:12 PM, Neha Narkhede <[email protected]>wrote:

> * Scenario 1:  BrokerID 1,2,3   Broker 2 dies.
>
> Here, you can use reassign partitions tool and for all partitions that
> had a replica on broker 2, move it to broker 4
>
> * Scenario 2: BrokerID 1,2,3 Catastrophic failure 1,2,3 die but ZK still
> there.
>
> There is no way to recover any data here since there is nothing
> available to consume data from.
>
> Thanks,
> Neha
>
> On Fri, Mar 22, 2013 at 10:46 AM, Scott Clasen <[email protected]> wrote:
> > What would the recommended practice be for the following scenarios?
> >
> > Running on EC2, ephemperal disks only for kafka.
> >
> > There are 3 kafka servers. The broker ids are always increasing. If a
> > broker dies its never coming back.
> >
> > All topics have a replication factor of 3.
> >
> > * Scenario 1:  BrokerID 1,2,3   Broker 2 dies.
> >
> > Recover by:
> >
> > Boot another: BrokerID 4
> > ?? run bin/kafka-reassign-partitions.sh   for any topic+partition and
> > replace brokerid 2 with brokerid 4
> > ?? anything else to do to cause messages to be replicated to 4??
> >
> > NOTE: This appears to work but not positive 4 got messages replicated to
> it.
> >
> > * Scenario 2: BrokerID 1,2,3 Catastrophic failure 1,2,3 die but ZK still
> > there.
> >
> > Messages obviously lost.
> > Recover to a functional state by:
> >
> > Boot 3 more: 4,5 6
> > ?? run bin/kafka-reassign-partitions.sh  for all topics/partitions, swap
> > 1,2,3 for 4,5,6?
> > ?? rin bin/kafka-preferred-replica-election.sh for all topics/partitions
> > ?? anything else to do to allow producers to start sending successfully??
> >
> >
> > NOTE: I had some trouble with scenario 2. Will try to reproduce and open
> a
> > ticket, if in fact my procedures for scenario 2 are correct, and I still
> > cant get to a good state.
>

Re: 0.8 best practices for migrating / electing leaders in failure situations?

Reply via email to